What become old biology tools?

One of my colleagues remarked today that a lot of old biology software tools and libraries are designed by academics and abandoned as soon as their interest (and funding) switches to something else.
So it is very dangerous for a community or a business to rely on this kind of tools, as when something goes wrong there is no expert in sight ready to offer an helping hand.
I wondered if something could be done to improve this situation. At least a list of such abandoned biology software could be maintained.

Ontologies are not magic wands

Some 15 years ago, ontologies were the big thing. Financing an EU project was easy if ontologies and semantics were mentioned as primary goals.
Now this time is gone, except in biology where ontologies are still used, often in a very different way from what they were originally intended to do in the “Semantic Web” good old time.

More specifically a common biology research activity is to measure the expression of proteins in  two situations, for example in healthy people and in patients. Then the difference between the two sets of measurements is asserted, and the proteins and their genes that are activated in the illness situation are suspected to be possible targets for any new drug.

Gene differential expression is the biological counterpart of machine learning in CS, it is a one size fits all solving methodology.

Indeed those deferentially expressed genes are rarely possible targets for any new drug , as each protein and gene is implicated in so many pathways. So instead of refining the experimentation, to find genes that are implicated in a fewer pathways, a gene “enrichment” step is launched. “Enrichment” involves querying an ontology database, to obtain a list of genes/proteins that are related to the deferentially expressed genes, and that are hopefully easier target for putative drugs.

Here there are two problems.
* The first is the choice of the ontology, for example there is an excellent one which is named Uniprot. But there are some awful but often preferred choices, like Gene Ontology which gives dozens results when Uniprot gives one. Indeed if you have only one result after “enrichment” and if you are in a hurry, you are not happy, so the incentive to switch to Gene Ontology is strong.
* The second problem arises when the result set comprises several hundred genes/proteins. Obviously this is useless, but instead of trying to define a better experimentation, people thought that some statistical criteria would sort the result set and find a credible list of genes. This lead to the design of parameter free tools such as GSEA. Very roughly these tools compare the statistical properties of the result set with those of a random distribution of genes, if they are very different, then the conclusion is that they are not at random, which does not tell much more than that. This is similar and related to the criticism of fisher test, p-value and null hypothesis. This is a complicated domain of knowledge.

These tools are very smart, but the best tool cannot provide meaningful answers from garbage, so disputes soon arisen about the choice of the free parameter methodology, instead of questioning the  dubious practices that made them needed in the first place.


When PBPK simulators do not reflect modern physiology

PBPK simulators use a compartmental approach, where fluids are transferred between compartments and transformed inside them.

It is a very mechanistic approach, a successful one, but it ignores many important aspects of Mammalian biology such as the influence of the genome on health or the signaling between cells or throughout the organism, for example with the immune system.

Even the illness or simply the unhealthy human, is not implemented in models, rather they are “cases” that are hard-wired in the software.

It is well known there is a need to separate the model from the simulator, in order to make it possible to change some parameters or even the whole model at will. Every CellML or SBML simulator offers that kind of functionality.

It goes the same way for genetic information, not only it should be taken in account, but it should be separated and accessible in its own set of portable data. I do not know how SBML format would make it possible.

Cell or organism signaling should also be assigned to a distinct set of portable data. We have already something similar for fluids in our current simulator’s PoC, it is separated in a distinct XML file, something unfortunately not standardized.

Therefore we have to think how fluids, genetic information (and variants) as well as signaling or health will be taken in account in future versions of the PoC of our simulator.

In addition we have to offer a multi-faceted GUI, for example a human diabetic model and a dysfunction of insulin production are nearly about the same thing, but they are different ways to discuss about it and they are not the exactly the same topic.


A valuable Old Timer

General Electric’s BiodMet is a Java PBPK simulator which is quite old now. It appears in 2008 and did not receive any improvements  at least since 2013. Indeed there are ferocious competitors like Certara’s SimCyp and many others.

However it is still an impressive software, with a GUI providing a detailed simulation of many organs seemingly up to cell level. Indeed not everything is perfect, there is nothing about eyes, the circulatory system is very basic, there is only one lung or kidney, genital organs simulation is not implemented, but for a free to use software it is nevertheless awesome. When the software runs a simulation it says it uses 2126 ODEs equation, which is extremely impressive.

In order to test the veracity of this claim, we used the same approach as in last post. It turns out this claim is somehow true.

Actually the body itself is simulated with 52 equations (mostly for modeling blood usage by organs at cell level and modeling a few to model inter-organ fluid exchanges). There are also for each organ a set of 82 ODEs to model how the drug moves from one compartment to the next and how it is transformed. The pattern is to model how the drug moves from vasculature to organs interstitial medium, and from there back to  vasculature and to cell’s cytosol and from there to each compartment of the cell.

BiodMet is still available at: http://pdsl.research.ge.com/

When software does not deliver what it advertizes

We are interested in competitors performance.

Most PBPK simulators follow a pattern where a GUI is used to design a physiology model in order to construct ODEs that are solved by an ODE solver. The trick is to make it easy for the user to think in physiology terms, when she enters model’s parameters or reads simulation’s results, while at the same time enable the code to manage an ODE solver accordingly to this model.

Understanding what a solver really does is not so easy, particularly if the source code is not available. But even if it is available, what can be deduced is not very informative, as a model is something which is instantiated at run time, and not something written out in the source code. Most of simulator’s code is used in the GUI, the solver is often provided by a third party such as Numerical recipes (http://numerical.recipes/). Even for the GUI, there is reliance on existing libraries such as Java’s DefaultMutableTreeNode or JFreeChart. To understand what the solver does, one has to observe the user provided function that the ODE solver calls to progress from one step to the next.

We took a small free PBPK modeler. The literature about it is sparse but presents it in a favourable manner. The reader understands this software is a labour of love, minuscule details seems to be taken in account. Its GUI follows the form paradigm and is quite complex.

After decompiling the binaries, we logged each call to the user function, rebuilt the software and ran it with the default values. At the end, it was apparent that this simulator only uses 5 ODEs on only two compartments: Lung and liver. Nothing is computed about  kidney, muscle or heart even if their models are described with much details in this software’s literature.


About physiology simulators

There are different kinds of software simulators of human physiology.

Some like BioGears aim at simulating a whole body, and how it evolves when it is harmed or when a substance is injected in it.
Others like BioDmet are intended for pharmacology research, and they study how a foreign substance is assimilated and propagates through organs.

* Simulation grain
Those software are also different in the simulation grain they provide. While some software simulate organs or even sometimes cells, others only make a rough simulation of the whole organ using for example the Glomerular Filtration Rate. Even if cells are simulated, the tissues they are in, are supposed to be homogeneous at organ scale, they offer no fine grained physiology simulation. No software will simulate tissues like Henle loop in kidney, or the retina’s epithelium. But most scientist teams need to work on such tiny parts of the body. Most of existing modeling software even ignore that there are two kidneys or two lungs, most modeling software simulate them only as a whole. The organ illness is taken in account with an overall parameter which degrades the model output, and not as a consequence of model’s input parameters and architecture. This may help to simulate accurately a in vivo model with some tinkering, but it does not help to understand why the illness exists in the first place.

* Software models should be based on biological concepts.
Most modeling tools use an approach based on mechanical properties of tissues, which is named ACME for “Absorption, Distribution, Metabolism, and Excretion”. One reason why it is very effective, is that good mathematical (ODE) and software tools (ODE solvers) exist. While effective until now, this approach ignores basic biological effects such as cells and tissues growth and depletion. It also ignores metabolic or signaling pathways, making this approach entirely useless for whole classes of biological phenomenas.
Other modeling tools are based on approaches even more foreign to physiology and biology, for example BioGears’ debuts were based on an electrical simulator (SPICE), so every BioGears model is still to be thought as electronic circuits.

* Integration in laboratory’s tool chain
No existing modeling software is really integrated in the laboratory tool chain. In vitro testing is quite useful in the characterization of some specific processes taking place inside the living organism. These processes can be integrated into the pharmacokinetic (PBPK) models. However this integration is quite challenging and often it is subcontracted to dedicated research services. This is particularly important when working with commercial Organ on Chips where the organ features and behaviors are well characterized but the underlying model is still unknown. As modeling tools are supposed to be a part of a tool chain between in vitro models and in vivo models, the software modelling tool should also create a standardized model that could be reused in other parts of the laboratory tool chain.

* Modeling software should enable to test multiple models.
Most software have physiological knowledge (models) embedded in code, only a few software load this kind of physiological information from human readable files. Not only this renders evolution of those software difficult and costly, but it hides their implicit model from the scientist and forbids her to substitute one model with another, in order to fine tune them to simulate accurately an organ on chip.

At the same time there are still no tools to create organ size models in computer biology exchange languages like SBML or CellML.

* Simulating accurately foreign pathogens and substances.
To solve some problems it is necessary to know how a pathogen will develop in human body, where it will likely thrive. Any human physiology simulation tool must include a good simulator of those pathogens in the human body.

* Reducing cost of drug discovery
In silico research in medicine is thought to have the potential to speed the rate of discovery while reducing the need for expensive lab work and clinical trials. One way to achieve this is by producing and screening drug candidates more effectively. A good modeling tool should automatically discover and propose a list of drug’s desired properties and drug candidates.

* Pre-clinical studies
Human beings weight and height are not standardized , more so ethnicity, family history and other factors create variations, for example in kidney modelling. A drug should work in all those cases and a drug candidate should be proposed not as a result of a statistical analysis, such as done in population modeling, but as a result of the model inner working. It should be fact based, not the result of a black box model, to help this drug to comply with regulatory bodies requirements and get required authorization.