Early and low cost detection device for Heart Failure

Six months ago we registered a new project on Hackerday and some other places.

The idea was to detect early the heart failure condition, because it is a condition that affects most of us as we age, and there was a lot of material online thanks to various challenges about this subject, like this one:

To create a proof of concept we used a low cost fetal doppler ($50) and a Linux box and were able to record heart sounds without using gel on an adult. So one of the requirement for medical devices was filled: To be ready to be used in seconds.

In most medical devices, there is an implicit requirement: To make the output understandable, it must offer an explanation of the medical statement. So using a black box ML à la Kaggle, is out of question.

In heart sound competition like Physionet 2016, they train HMM in order to create a statistical model of the heart sounds of some condition, HMM can “explain” their internal model by showing the probability a appearance of each state, for example the probability of arrival of a S2 sound at some time after a S1 sound in a particular sequence of heart sounds.
So a HMM model can be used to classify a new sequence of heart sounds either as quite similar to the trained model or not.

One might ask why not using deep learning as it seems to have made wide steps forward recently and as very nice software are available like TensorFlow.
There is a big internal difference between ML using CNNs à la TensorFlow and ML using HMMs, in “an ideal world” a CNN finds its feature without human intervention where a HMM needs that each observation is “tagged” with some human knowledge with a Viterbi or similar function. The tagging is part of what makes the resulting model understandable, however automatic tagging (as in unsupervised learning) is indeed hard.

In truth there is similarity between the design of successful CNNs and HMMs, they have a cost function, however CNNs cost functions do not create meaning.
Designing the cost function of a CNN or the Viterbi function of a HMM is the most important part of any ML setup. All claim that we heard about effectiveness of ML are due to the design of those functions, not to some fancy ML algorithm.
It is a very hard job, far above the state of art.

In order to circumvent this problem, most ML proposals use another ML setup to create the cost function as in most Physionet 2016 challenges or in a recent article that is highly considered in the domain of skin cancer detection: http://www.nature.com/nature/journal/v542/n7639/abs/nature21056.html.

Indeed if one uses ML to create the cost function, the resulting model becomes highly opaque, and medical policy maker, scientists, or specialists will find it useless or even dangerous.

On the long term this practice of using a ML derived cost function will be discouraged, but I suppose this is part of the current hype curve about “deep learning”. It is worse when using small signals ML like in Deep Forest algorithm, where it becomes impossible (today) to reverse engineer the ML model by disturbing it. In addition deep learning cannot be done with a $50 device, it mandates huge computing facilities.

So we created our own Viterbi function for our HMM, and it is quite efficient while being quite simple. The next steps are to improve it, make it more informative and able to move from the Linux box to a microcontroller. Stay tuned.


Aida as a tool to assist and audit the science process

DARPA’s new AIDA program may (also) help to provide better understanding of science publications and results, by helping separating out interesting from irrelevant data.

Information complexity has exceeded the capacity of scientists to glean meaningful or actionable insights and doing science is increasingly more difficult as time passes. World class experts in one field will not understand statements made by scientists, even if only slightly outside their field.
The situation is even worse for other science stakeholders, such as scientific managers and policy makers who have a long interest in developing and maintaining a strategic understanding and evaluation of the scientific activity, field landscape, and trends. Information obtained from scientific publishing are often analyzed without their contexts. Often because of the complexity and superabundance of information, independent analysis results in interpretation which may be inaccurate.
It would be interesting to overcome the noisy, and often conflicting assertions made in today’s scientific publishing environment through a common tooling. Some efforts have already been done, for example the excellent Galaxy tool in Biology and to a less extend Notebook interfaces like Jupyter in coding. Another interesting trend is the pre-print activity which helps share information unsuitable for publishing with other scientists.
DARPA’s AIDA program aims to create technology capable of aggregating and mapping pieces of information. AIDA may provide a multi-hypothesis “semantic engine” that would automatically mine multiple publishing source and extract their common foreground assertions and background knowledge, and then it will generate and explore multiple hypotheses that will interrogate their true nature and implications.
The AIDA program hopes to determine a confidence level for each piece of information, as well as for each hypothesis generated by the semantic engine. The program will also endeavor to digest and make sense of information or data in its original form and then generate alternate contexts by adjusting or shifting variables and probabilities in order to enhance accuracy and resolve ambiguities in line with real-science expectations.
Even structured data can vary in the expressiveness, semantics, and specificity of their representations. AIDA has the potential to help scientists and science decision makers refine their analyses so that they are more in line with the larger and more complete overall context, and in doing so achieve a more thorough understanding of the elements and forces shaping science.

Low cost, non invasive, Continuous Glucose Monitoring utilizing Raman spectroscopy

A high quality, low cost, non invasive Continuous Glucose Monitoring (CGM) based mainly on Raman spectroscopy, is presented. In addition a number of sensors provide information about patient’s context. The CGM re-calibrates itself automatically.

Designing non-invasive continuous glucose monitoring (CGM) is an incredibly complex problem that presents a number of challenging medical and technological hurdles. It is told that around 70 companies tried to bring non-invasive glucose monitoring devices to the market without any success.

Quality in our CGM proposal comes from the number of technologies used to increase measurement precision. The understanding of the biological operating context enables to accurately predict glucose values.

More information here: Glucose_monitor

Analysing eyes’ biomarkers at home with passive infrared radiation.

Currently there is no portable device that can check diseases of the aging eye such as: Glaucoma, age-related macular degeneration, Diabetic retinopathy, Alzheimer’s Disease, Cataract, clinically significant macular edema, keratoconjunctivitis sicca (dry eye disorder), Sjogren’s syndrome, retinal hard exudates, ocular hypertension, uveitis.

We propose a portable device which when placed before one eye but without any physical contact, analyzes its natural infrared spectrum in order to detect molecules that reveal a potential medical condition. If a biomarker is detected, the device asks to the user to consult a medical doctor, with an indication about urgency but without disclosing any medical information. On contrary the doctor can securely access a wealth of information without needing a dedicated device.

The medical doctor proposes this tool to the patient, and is constantly in control of the device and the relation she has with her patient.

More information here: passive_eye_care

Mesosphere light scattering as Cell tower substitute.

Modern wireless technology can’t transmit energy and information with a good enough SNR, over 80km and over earth curve, in portable low cost devices with current regulations.

We propose a very different approach based on astronomy technology, where a laser emits light vertically, generates a luminous dot at high altitude (similar to astronomy’s guidestar) and this light is detected at very long distance. By modulating the luminosity of this guidestar, it is possible to transmit information. This technology works even if the sky is cloudy and in daylight.

There is no need to build any infrastructure network. Each cell in a field can access the base station even at 80km. The cost per field station is less than $9,000. Field stations can be moved at will.

More information here: base_station_for_deserts

What become old biology tools?

One of my colleagues remarked today that a lot of old biology software tools and libraries are designed by academics and abandoned as soon as their interest (and funding) switches to something else.
So it is very dangerous for a community or a business to rely on this kind of tools, as when something goes wrong there is no expert in sight ready to offer an helping hand.
I wondered if something could be done to improve this situation. At least a list of such abandoned biology software could be maintained.

Ontologies are not magic wands

Some 15 years ago, ontologies were the big thing. Financing an EU project was easy if ontologies and semantics were mentioned as primary goals.
Now this time is gone, except in biology where ontologies are still used, often in a very different way from what they were originally intended to do in the “Semantic Web” good old time.

More specifically a common biology research activity is to measure the expression of proteins in  two situations, for example in healthy people and in patients. Then the difference between the two sets of measurements is asserted, and the proteins and their genes that are activated in the illness situation are suspected to be possible targets for any new drug.

Gene differential expression is the biological counterpart of machine learning in CS, it is a one size fits all solving methodology.

Indeed those deferentially expressed genes are rarely possible targets for any new drug , as each protein and gene is implicated in so many pathways. So instead of refining the experimentation, to find genes that are implicated in a fewer pathways, a gene “enrichment” step is launched. “Enrichment” involves querying an ontology database, to obtain a list of genes/proteins that are related to the deferentially expressed genes, and that are hopefully easier target for putative drugs.

Here there are two problems.
* The first is the choice of the ontology, for example there is an excellent one which is named Uniprot. But there are some awful but often preferred choices, like Gene Ontology which gives dozens results when Uniprot gives one. Indeed if you have only one result after “enrichment” and if you are in a hurry, you are not happy, so the incentive to switch to Gene Ontology is strong.
* The second problem arises when the result set comprises several hundred genes/proteins. Obviously this is useless, but instead of trying to define a better experimentation, people thought that some statistical criteria would sort the result set and find a credible list of genes. This lead to the design of parameter free tools such as GSEA. Very roughly these tools compare the statistical properties of the result set with those of a random distribution of genes, if they are very different, then the conclusion is that they are not at random, which does not tell much more than that. This is similar and related to the criticism of fisher test, p-value and null hypothesis. This is a complicated domain of knowledge.

These tools are very smart, but the best tool cannot provide meaningful answers from garbage, so disputes soon arisen about the choice of the free parameter methodology, instead of questioning the  dubious practices that made them needed in the first place.


When PBPK simulators do not reflect modern physiology

PBPK simulators use a compartmental approach, where fluids are transferred between compartments and transformed inside them.

It is a very mechanistic approach, a successful one, but it ignores many important aspects of Mammalian biology such as the influence of the genome on health or the signaling between cells or throughout the organism, for example with the immune system.

Even the illness or simply the unhealthy human, is not implemented in models, rather they are “cases” that are hard-wired in the software.

It is well known there is a need to separate the model from the simulator, in order to make it possible to change some parameters or even the whole model at will. Every CellML or SBML simulator offers that kind of functionality.

It goes the same way for genetic information, not only it should be taken in account, but it should be separated and accessible in its own set of portable data. I do not know how SBML format would make it possible.

Cell or organism signaling should also be assigned to a distinct set of portable data. We have already something similar for fluids in our current simulator’s PoC, it is separated in a distinct XML file, something unfortunately not standardized.

Therefore we have to think how fluids, genetic information (and variants) as well as signaling or health will be taken in account in future versions of the PoC of our simulator.

In addition we have to offer a multi-faceted GUI, for example a human diabetic model and a dysfunction of insulin production are nearly about the same thing, but they are different ways to discuss about it and they are not the exactly the same topic.


A valuable Old Timer

General Electric’s BiodMet is a Java PBPK simulator which is quite old now. It appears in 2008 and did not receive any improvements  at least since 2013. Indeed there are ferocious competitors like Certara’s SimCyp and many others.

However it is still an impressive software, with a GUI providing a detailed simulation of many organs seemingly up to cell level. Indeed not everything is perfect, there is nothing about eyes, the circulatory system is very basic, there is only one lung or kidney, genital organs simulation is not implemented, but for a free to use software it is nevertheless awesome. When the software runs a simulation it says it uses 2126 ODEs equation, which is extremely impressive.

In order to test the veracity of this claim, we used the same approach as in last post. It turns out this claim is somehow true.

Actually the body itself is simulated with 52 equations (mostly for modeling blood usage by organs at cell level and modeling a few to model inter-organ fluid exchanges). There are also for each organ a set of 82 ODEs to model how the drug moves from one compartment to the next and how it is transformed. The pattern is to model how the drug moves from vasculature to organs interstitial medium, and from there back to  vasculature and to cell’s cytosol and from there to each compartment of the cell.

BiodMet is still available at: http://pdsl.research.ge.com/

When software does not deliver what it advertizes

We are interested in competitors performance.

Most PBPK simulators follow a pattern where a GUI is used to design a physiology model in order to construct ODEs that are solved by an ODE solver. The trick is to make it easy for the user to think in physiology terms, when she enters model’s parameters or reads simulation’s results, while at the same time enable the code to manage an ODE solver accordingly to this model.

Understanding what a solver really does is not so easy, particularly if the source code is not available. But even if it is available, what can be deduced is not very informative, as a model is something which is instantiated at run time, and not something written out in the source code. Most of simulator’s code is used in the GUI, the solver is often provided by a third party such as Numerical recipes (http://numerical.recipes/). Even for the GUI, there is reliance on existing libraries such as Java’s DefaultMutableTreeNode or JFreeChart. To understand what the solver does, one has to observe the user provided function that the ODE solver calls to progress from one step to the next.

We took a small free PBPK modeler. The literature about it is sparse but presents it in a favourable manner. The reader understands this software is a labour of love, minuscule details seems to be taken in account. Its GUI follows the form paradigm and is quite complex.

After decompiling the binaries, we logged each call to the user function, rebuilt the software and ran it with the default values. At the end, it was apparent that this simulator only uses 5 ODEs on only two compartments: Lung and liver. Nothing is computed about  kidney, muscle or heart even if their models are described with much details in this software’s literature.