Computational Biologists: The Next Pharma Scientists?

Computational biology has been recognized as a field independent of its parent disciplines (computer science and biology) for just 15 years. For those seeking work in the field, it has been 15 tumultuous years. Employment in computational biology has gone through rapid booms and busts as various industries tried, with varying success, to find profitable uses for these scientists’ highly technical skills. Today, job prospects within computational biology — also known as bioinformatics — seem strong and appear to be growing, buoyed by pharmaceutical and biotech industries looking to take advantage of reams of genomics data and usher in a new era of drug discovery.

“If I were a senior or first-year graduate student interested in biology, I would migrate as fast as I could into the field of computational biology.” — Francis Collins

It could just be another boom, soon to be followed by another bust — no one can rule out that possibility — but these are good days for computational biologists. Even as scientists in other fields struggle to find jobs, computational biologists are being snapped up as soon as they graduate with lucrative salary offers, says Russ Altman, a professor of bioengineering, genetics, and medicine and director of the biomedical informatics training program at Stanford University in Palo Alto, California.

Altman doesn’t think we’re just at another high point in the boom-bust cycle. The reason computational biology never fully got off the ground before now, he says, is that pharmaceutical companies weren’t yet grappling with the kinds of problems that are best-suited to computational biologists: finding useful signals in tremendously large sets of unsorted, noisy data.

Welcome to big data

Bioinformatics is hardly new to the pharmaceutical industry. The problem is that until recently, those companies weren’t thinking big enough, Altman says. Traditionally trained chemists and biopharmacologists mostly studied their own data sets with no formal training in the computing side.

“Now there are these amazing data sets from extremely clever experimentalists who’ve figured out how to do things in high-throughput [experimentation], and they represent a substantial challenge to people who aren’t trained in computation because it passes what I call the ‘Excel barrier,’ ” Altman says. “I’ve been amazed at what a biologist with Excel can do, but we have now exceeded the Excel barrier in terms of the number of rows and columns and the computational powers of Excel.”

For the pharmaceutical industry, big data is the copious and ever-growing collection of human genome data available freely and publicly. Instead of systematically testing the effects of known compounds — the pharmaceutical industry’s basic model for more than a century — scientists can now investigate backward, combing through genomic data to find links between specific genotypes and diseases and then screening drug data to identify therapeutic candidates. But that kind of data simply won’t fit into an Excel spreadsheet.

“I think the old paradigm of ‘one drug, one target’ is quickly becoming outdated,” says Nicholas Tatonetti, a computational biology graduate student in Altman’s lab who is finishing his Ph.D. this year and who recently accepted an assistant professorship at Columbia University. “It [was] a smart way to think about it originally … and they took it really far and made billions of dollars. But what’s happened is, people forgot that biology is not so simple. The systems are really what we’re playing with here, not one protein doing one simple function.”

“If we can understand [these systems] — and the only way to really do that is through modeling with computational biology — then maybe we can predict the adverse effects of a drug or the therapeutic effects of a drug,” Tatonetti says.

Boom or bubble?

As the pharmaceutical industry’s blockbuster drugs fall off the patent cliff, with precious few drugs in the pipeline to replace them, there are signs that big pharma could turn more of its attention to biologically derived medicines. If that happens, computational biologists will likely play a leading role in their discovery, Altman says.

It’s not a job for traditional computer scientists. “They have no intuition for why they’re doing what they’re doing, so you’d have to train them in-house in a boot camp on the basics of biology, why certain assumptions are not OK, and why other assumptions are,” he says. “The comfort with ambiguity and fuzziness that we introduce in our training programs and, most importantly, the biological vocabulary,” mean that people with computational biology training “wind up being extremely valuable to these companies.”

Altman says that according to his own observations, demand for computational biologists far outstrips supply. “I was just talking to a colleague the other day from a major drug company who came in with a piece of paper with 15 bioinformatics jobs that they’re ready to hire tomorrow,” he says. The job listings, posted by Merck, were primarily for positions in Boston and in various cities in Pennsylvania.

The need is even more pronounced in California’s Silicon Valley area, Altman says. It’s not big pharmaceutical companies driving the demand there, he says, but small biotech companies who’ve realized they can capitalize on the enormous amount of publicly available health and genomics data.

Joel Dudley, a former student of Altman’s who last year founded NuMedii, one of Silicon Valley’s numerous biotech companies, agrees that computational biologists currently have a wealth of opportunities. Every person in his graduating class at Stanford received at least one job offer before graduation, he says, and most received more than one. “The job market is amazing,” he says.

Not all of the job offers come from companies looking to develop medicine. “There are other industries that have similar computational needs that have figured out that bioinformaticians are good at these things,” Dudley says. Google, Facebook, and Netflix are hiring computational biologists to sort through their own versions of big data. The competition can make it difficult for smaller companies to attract top talent, he says. “We have a [computational biology] position open at NuMedii, and it’s been very difficult to fill. The start-up people here are having a hard time hiring anybody.”

But that supercharged job market might be unique to Silicon Valley, says Philip Bourne, professor of structural bioinformatics at the University of California, San Diego, and co-founder and editor-in-chief of the journal . Bourne doesn’t doubt that well-trained computational biologists are in demand, but he cautions against being overly optimistic or extrapolating Silicon Valley’s success to other regions.

San Diego went through a similar hiring bonanza 3 or 4 years ago, Bourne says, and many of the companies that hired those scientists have since gone out of business. There’s no way to know, but Silicon Valley’s computational biology job market could follow a similar pattern, Bourne says.

“If you’re in areas where the need is actually growing, such as next-generation sequencing … or if you’re doing anything with translational medicine, … I think the job market is pretty good,” he says. But he’s less optimistic than Altman about big pharma committing to a computational biology approach anytime soon. “It’s like a supertanker. It takes miles to stop and change directions. I think they do get it now, to various degrees, but it’s a slow process to change.”

Enoch Huang, head of computational sciences at Pfizer PharmaTherapeutics Research and Development in Cambridge, Massachusetts, says that big pharma needs scientists with computational biology skills right now, but the needs are precise. “I think there are specific needs that the biopharmaceutical industry is looking for, problems that are essential to the pipeline that need to be solved,” he says. Particularly in demand are biologists with expertise in statistics, mathematical modeling, machine learning, artificial intelligence, software engineering, or database design. “If you have expertise that enables an organization such as Pfizer to be more successful than they have been historically, then there are plenty of roles,” Huang says.

It’s hardly a feeding frenzy. Pfizer employs between 50 and 60 computational biologists, depending on how you define them. Hiring for such jobs has slowed, Huang says, but the company is still looking to hire scientists with the right skills.

He himself will be involved in the hiring decisions for a yet-to-be-determined number of computational biologists for the in-progress move of Pfizer’s therapeutic research divisions to Cambridge, Massachusetts. “I would say it continues to be a great interest for our research units.”


If there truly is a shortage of computational biologists in pharma and biotech, Altman says, the federal government should be investing more heavily in training. Three institutes at the National Institutes of Health (NIH) — the National Institute of General Medical Sciences, the National Cancer Institute, and the National Library of Medicine — fund bioinformatics training programs, he says, but that’s not enough. Altman’s training grant from NIH was renewed recently, but its funding was reduced, so he’ll have to trim three or four training slots. “I do think it’s worrisome that a field that’s exploding is seeing a reduced amount of support for training.”

A recently announced Obama Administration project could reverse or offset that trend. The Big Data Research and Development Initiative pledges $200 million for NIH, the National Science Foundation, and other federal agencies to support data collection, analysis, and dispersion. Some of that money will go toward training computational biologists. When the program comes online, it could result in some direct hiring within the agencies it supports.

NIH’s director, Francis Collins, seems to believe in the field’s importance. At a 29 March briefing announcing the initiative in Washington, D.C., Collins told Science Careers, “If I were a senior or first-year graduate student interested in biology, I would migrate as fast as I could into the field of computational biology. … There are vast quantities of high-quality data accessible to anybody who has the skills to find the nuggets of truth that are hiding in that information.”

Looming behind such statements, though, is the specter that big data could be a big bust. Gilbert Omenn, who directs the Center for Computational Medicine and Bioinformatics at the University of Michigan, Ann Arbor, and is a senior science director for the National Center for Integrative Biomedical Informatics, has seen that kind of bust firsthand. In 2001, Pfizer began investing in an Ann Arbor medical research campus, which hosted a contingent of computational biologists. At its peak, the bioinformatics team consisted of 15 scientists. But, although the team did good work, fruitful results never materialized, Omenn says. Pfizer began cutting jobs, and the lab closed in 2009. “They said, ‘All these biomarkers and things we thought were going to be so productive for our drug discovery process and discovering new targets — now we’re drowning in targets and having the same old problems with drugs failing,’ ” Omenn says.

There’s reason to believe that drug development companies may have learned from those missteps, Omenn says. He says companies have figured out that bioinformatics scientists work better when they’re embedded within research teams, helping to guide target selection from the beginning. And as more data become available, the ability of computational biologists to do important work improves. On the whole, Omenn says, expect a healthy employment market for the foreseeable future. “I think the capacity to absorb these people is pretty large.”

The two most critical elements of interview day

Financing Medical School; Grants for Foreign Nationals