In the summer of 2008, when Wired magazine ran a cover story titled “The End of Science,” former Editor-in-Chief Chris Anderson wrote, “The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all. There’s no reason to cling to our old ways. It’s time to ask: What can science learn from Google?”
Five years later—not a lot of time, admittedly—data, computers, and statistical tools are indeed having a major impact on science. In the domain sciences—traditional fields like physics, biology, and chemistry—the old ways are holding. People still care about causation, mechanisms, and coherent theories, but in many disciplines, researchers are looking to supplement those traditional elements of science, harvesting gains from the data deluge by, in effect, learning from Google. In May, Science Careers wrote about some of these “pi-shaped” researchers who have added computer science techniques to the techniques of their various native fields.
“I love the fact that once I program and modify a single living cell, I can have millions of similar cells by just growing them up.” —Samuel Perli
But it’s also possible to get pi-shaped from the other end—the computer science end. By adding fundamentals of biology to their computer science training, more than a few native computer scientists are contributing—and preparing themselves to contribute—to the advancement of the life sciences (among other fields), laying foundations for new branches of study. It is indeed a promising approach, but there is still much work to be done—satisfying work, one pioneer says.
Biology for computer scientists
Lawrence Hunter graduated from Yale University with a Ph.D. in computer science in 1989, during what he calls “the AI winter,” a period of reduced funding and interest in artificial intelligence—the field he had been studying. So he went in a new direction, joining the fledgling Human Genome Project as a programmer. “I had studied biology till tenth grade, which meant I had no modern biology at all,” he says. Hunter attended weekly genomics seminars at the U.S. National Library of Medicine (NLM) where he worked. As his own coding began yielding results—for example, a list of genes that allowed eukaryotes to diverge from prokaryotes—he began to take biology more seriously. He asked a lot of questions and read all of the papers that his colleagues suggested. “I spent a decade learning by osmosis from all the brilliant people around me,” says Hunter, who is considered one of the founders of bioinformatics.
Hunter is now the director of the computational bioscience program at the University of Colorado School of Medicine. He has published seminal papers in his field and written a life sciences textbook aimed at non-life scientists: .
Hunter believes that computer scientists can cross over at any career stage. “Studying CS is like learning to play a musical instrument and can’t be done quickly. Biology can be learned by reading and remembering,” he says.
Synthetic biology, which involves designing and building genetic constructs and testing them in living cells, requires wet lab skills. “The combination of quantitative abilities and experimental biology skills is very valuable in synthetic biology,” says Timothy Lu of Massachusetts Institute of Technology’s (MIT’s) Research Laboratory of Electronics, an interdepartmental center where “research encompasses an extensive range of natural and man-made phenomena.”
Nevertheless, Lu, who has an undergraduate degree with majors in electrical engineering and computer science (and minors in biomedical engineering and biology), doesn’t make bench experience a prerequisite for his incoming graduate students. He just expects them to be enthusiastic about learning biology.
That approach worked for Samuel Perli, a 5th-year doctoral candidate in Lu’s lab who had no college-level biology to speak of before he started his graduate studies in computer science at MIT. While doing research on wireless networks, he attended a talk by the director of the institute’s Synthetic Biology Center. He was inspired to sign up for “Introduction to Experimental Biology,” after squaring away the requirements for his master’s degree. That semester, he read books, research papers, and popular articles on biology, and had conversations with friends who did biology research. He was following in the footsteps of the academics who had created this field less than 2 decades ago. “I love the fact that once I program and modify a single living cell, I can have millions of similar cells by just growing them up,” says Perli, who is now at the bench almost every day.
At Harvard Medical School, Pamela Silver is a founding member of the Department of Systems Biology, which she describes as a field that among other things “seeks to understand what evolution gave us and how we got to where we are.” Though we have genomic information about various organisms, we still have to synthesize it to understand how the whole organism or system functions, she explains in this YouTube video (below). Similar to Lu, she doesn’t require incoming researchers to have a background in cell biology.
On her advice, Avi Robinson-Mosher, a postdoc in her lab with a Ph.D. in computer science, took an intensive summer course in Physiology at the Marine Biological Laboratory in Woods Hole, Massachusetts. Robinson-Mosher could’ve worked for Pixar, Silver says, but he now uses his simulation skills to understand macromolecular interactions, research that could help scientists engineer novel proteins for drugs. “Systems biology offered a good combination of being able to apply my computational background to actually making things that can do something useful for people,” he says.
Inside black boxes
Vaughn Iverson worked for more than 10 years in industry before leaving Intel to go back to his old school, the University of Washington (UW), in Seattle—as a graduate student in oceanography. “For me, learning CS was always a means to an end for leveraging the power of computation to solve scientific or engineering problems,” says Iverson, who minored in chemistry as an undergraduate. ” The seeds of my transition were planted much earlier.”
Iverson, who will defend his Ph.D. thesis this fall under E. Virginia Armbrust, has been studying single-celled organisms in seawater. Only a small percentage of those microbes can be isolated and grown in the lab—so Iverson and the team skipped those steps, sequencing a whole microbial community and proceeding algorithmically to disentangle details of the individual organisms in the community. Microbes that could not be cultured previously, and those that form only a negligible percentage of a sample, were revealed, along with their genomes and their roles in the community. Recently, he published the results of his research on microbial metagenomics in .
The special role of computer scientists in contemporary research, Iverson says, is roughly analogous to the role in other fields of the designers of sophisticated scientific instruments: sequencing machines, telescopes, particle colliders, and so on. Domain scientists may not be involved with the design, maintenance, or even operation of the instruments. Similarly, for most of the scientists who depend on them, domain-specific software tools are “black boxes.” This is where “pi-shaped” researchers have an advantage, Iverson says: They can “open up the black boxes, demystify them, provide guidance on their limitations and proper use, and improve or replace them as necessary.”
Big data science
Joseph Hellerstein, manager of computational discovery for science at Google, says that the relationship between computation and domain science began to change with the human genome project. Through its Exacycle project Google has lent some of its considerable computing muscle to a half-dozen big data problems being tackled in academia. Some of these projects promise to deliver tangible benefits to society, and all of them are likely to provide insights into the underlying science. Five out of those six projects are in the life sciences.
Biochemistry, Hellerstein says, is foundational knowledge in the 21st century. So this fall he will be teaching a semester-long course, “Biochemistry for Computer Scientists,” at UW’s eScience Institute. “Whether it is medicine, for machines through nanotechnology, in agriculture or materials, design problems require simultaneous innovation in computing and science that can only be accomplished by those with the combined skills.”