Sarah Webb
Mathematics can turn experimental data into information, if the personality fits
Stem cells compute. "A feedback circuit based in silicon will have certain properties that a feedback circuit constructed from biological molecules will have," says Ihor Lemischka, who directs the Black Family Stem Cell Institute at Mount Sinai Medical Center in New York. But figuring out those cellular circuits is dependent on networks made not just of silicon but also of flesh-and-blood scientists.
Systems biology emerged around 2000, along with microarrays and other high-throughput techniques that can collect hard drives full of quantitative biological data. By using data to build simulations of biological phenomena, the goal is to create hypotheses and 'intuition' beyond that of a single human brain. Not only does the analysis of all this data require a mathematical framework and computational tools, but it presents interesting questions for theoreticians with applied mathematics, computer science or engineering backgrounds.
Now that stem cell biology is producing enough quantitative data, it makes sense for biologists with large datasets and researchers with computational know-how to put their heads together. In labs and offices overlooking New York's Central Park, researchers at Mount Sinai Medical Center are forging one of a growing number of collaborations. "Stem cells are one of the best systems that you can study," says Avi Ma'ayan, a computational biologist who works with Lemischka. That's because stem cells are relatively homogeneous and easy to manipulate, and manipulating particular genes and proteins in these cells gives defined observable outcomes. It also helps, of course, that stem cells are at the centre of developmental biology and regenerative medicine.
Specifically, the Mount Sinai researchers hope that systems biology can show how molecular processes within a cell control its fate. Lemischka works with his biologist colleagues to perturb gene expression in stem cells. Using the technique of RNA interference, they can remove one transcription factor at a time at time zero, right after cell division, and measure changes such as those in transcription levels for genes over time. "The question becomes 'What causes what to change?' Can you use these data to come up with some kind of putative wiring diagram?" Lemischka asks.
With so many possible interactions, the human mind can't wrap itself around the problem unaided. Computational methods provide the tools that allow researchers to make sense of the sea of information. "It's difficult to build hypotheses compared to what you would do if you only measured three or four cell components or outcomes," says Ma'ayan. Even if a cell contained just 100 different interacting proteins, the number of combinations becomes astronomical. Computational tools can help rule out 99.9% of the possible wiring diagrams, and mathematical simulations can provide a rationale for further experiments.
Imaginary landscapes
Ma'ayan's research integrates known data about protein-protein interactions with experimental data from Lemischka's group. Using the lists of genes or proteins that change in those experiments alongside interactions reported in the literature, he builds 'networks' — modules that describe how, under different conditions, genes and proteins work together to alter cellular functions. Statistics helps to fish out potentially meaningful interactions, or nodes, from the network. From there, Ma'ayan can extract a biological theme, which might lead Lemischka to zero in on particular transcription factors.
"There are really only a limited number of biologists who are open to the abstraction and simplification that is needed to frame a problem in mathematical terms."
These models rely on metaphors, derived from physics, which were coined decades ago by theoretical biologist Stuart Kauffman and geneticist and developmental biologist Conrad Waddington. The mathematical models describe an energy 'landscape' in which cells, represented as marbles, roll down peaks that correspond to higher-energy, unfavourable states and gravitate toward lower-energy valleys. By creating a theoretical 'topographical map', Ma'ayan and postdoc Ben MacArthur can predict how a cell would navigate such a landscape in response to particular biological scenarios. Under certain scenarios, transcriptional randomness serves as 'noise' that vibrates the marble within the system. "If it's vibrating a lot, it can bounce over barriers by chance," MacArthur says. Assuming that cells evolved in a 'noisy' environment — in which random interactions are happening all the time — it's not unreasonable to think that cells developed ways to take advantage of that noise, Lemischka says.
But mathematical biology has gotten a bad name among some experimentalists, Lemischka says, because too often the landscapes it comes up with seem more imagined than real. "You can only do simulations if you have enough facts on the ground," he says.
But no amount of data is enough, says David Gifford, a computational biologist at Massachusetts Institute of Technology in Cambridge. "A suitably complex model can explain any data. It memorizes the data you give it," a problem known as overfitting, he says. A model is only worthwhile if it can predict data that was not used to construct it. Quick acceptance of a model's conclusions can be a real danger if scientists approach some available set of data as a mathematical exercise rather than with a sincere curiosity about the biological phenomena that generated it. "There are a lot of people who do high-throughput biology in a drive-by kind of way," Gifford says. The key to fruitful collaborations are colleagues who are really interested in the details of how the biology works and are willing to do difficult experiments.
Imagining reality
Generally, model making is iterative: constructing, testing, reconstructing, planning the best way to leave out certain data so that they can be used to evaluate the model, or, perhaps, identifying additional data to collect. As Lemischka and his co-workers refine their experiments to collect the most useful data, Ma'ayan and number-focused colleagues tweak their mathematical models. Together the researchers uncover genetic interactions and feedback loops, and their work feeds off each other to form its own collaborative loop, an idea Ma'ayan is quick to sketch out (Figure 1). It's a process that can apply equally to modelling the back-and-forth between gene transcription and cell fate, how the dynamic balance of transcription factors defines stemness, or how the native state of an embryonic stem cell, as defined by measurable parameters such as protein-DNA interactions or the packaging of the genome, compares with the state of reprogrammed or reprogramming cells.
Computational stem cell research piqued MacArthur's interest during his previous postdoctoral position at the University of Southampton, UK, working with researchers who built models of tumour and tissue growth. There he used mathematical modelling to examine how the transcription factors Oct4, Sox2 and Nanog interact with a set of other genes that determine cell lineage and cell differentiation into a particular type. Even with the growing interest in mathematical approaches, he says, opportunities to do high-quality stem cell research that bring together computational and experimental scientists are still relatively rare.
"Sometimes it's difficult because the thinking is different, the training is different," says Ma'ayan. Even seemingly simple concepts such as 'a model' can be sources for confusion. As MacArthur points out, where he might be talking about a mathematical construct, a biologist might be referring to a knockout mouse. "You have to spend a lot of time talking with people and getting a shared language," he says.
Although Lemischka doesn't need to know how to solve a system of differential equations, he does need to understand the fundamental principle — that such equations measure a change in one parameter relative to another. "I need to know how to explain a biological problem in such a way that someone who understands differential equations can translate that problem into mathematical terms," he says.
"The art is to pick out at which level you want to be and how you want to translate."
That's not always something biologists are comfortable with. Modelling translates 'rule-less' biology into mathematics, which is basically a set of rules, says Franziska Michor, who builds models of cancer and cancer stem cells at Memorial Sloan Kettering Cancer Center in New York. Biologists worry that forcing observations into equations can sand away the complexity in which more fundamental principles might be found.
One critical question for applied mathematicians is what level of abstraction — cells, genes, molecules or even atoms — leads to a manageable mathematical system that is also appropriate to the biological question, she says. Whereas Michor tends to base her modelling on cells or populations of cells, the Mount Sinai group is focusing on individual genes. "The art is to pick out at which level you want to be and how you want to translate," Michor says.
Traditionally, molecular biologists have searched for causal relationships between cellular factors by looking for the factor sitting upstream, says Sui Huang of the Institute for Biocomplexity and Informatics at the University of Calgary, Canada. This is a mindset, he says, that results in "a cartoon of the chain of causation." Dissatisfaction with the pathway approach led Huang to learn computational methods so that he could investigate why cells within a genetically identical population do not behave identically (see Chance and the single cell). Huang took an unusual path to gain his combination of skills, doing a PhD in an experimental biology environment but learning physics, mathematics and computational methods in the 'off hours' during his doctoral and postdoctoral studies. "There are really only a limited number of biologists who are open to the abstraction and simplification that is needed to frame a problem in mathematical terms," Huang says.
"In the short, medium and long terms, we won't be able to do stem cell biology without systems biology."
Computational biologists say that potential colleagues are becoming more likely to embrace their ideas than to shun them. When cancer modeller Trachette Jackson started her independent career in 1999, she felt like a salesman. "I had to go and knock on a few doors, ask questions about what was going on in particular experimental labs and try to convince researchers that there might be a mathematical or computational tool that could answer some of these questions," she says. But now, as a more established researcher at the University of Michigan in Ann Arbor, and with greater acceptance of her field generally, experimentalists are more likely to approach her to discuss research questions, she says. The field is too young to have a surfeit of success stories, but more systems biologists than ever are assembling long-term collaborations and securing funding for these projects. Gifford and his colleagues at Columbia University in New York and the Whitehead Institute in Cambridge, Massachusetts, have been collaborating since 2006 on an NIH-funded project to build predictive models of neural development. Michor, of Sloan-Kettering, received an NIH grant in 2008 and has served on an NIH study section, which suggests a growing recognition of computational biologists. On the other hand, she notes, publishing computational papers in biomedical journals can still be difficult.
This kind of research is very much in the growth phase, says Lemischka. "In the short, medium and long terms, we won't be able to do stem cell biology without systems biology."
Source: Nature