The rewards of working as a data wrangler


When geneticist Jacqueline Campbell’s postdoc funding was coming to an end, she came across a job ad that seemed to tap into an inner calling. The Crop Genome Informatics Laboratory at Iowa State University in Ames—where she was doing her postdoc—was looking for a research staff member to organize legume genome data. She enjoyed working at Iowa State and hoped to stay. The position offered a novel way to stay in research. Plus, some of her personality traits seemed to make her a good fit for the job, which requires intense organization and attention to detail. “You’d be awesome,” she recalls a colleague telling her. “If [something doesn’t] fit right, you get into this nervous tic until it’s fixed.”

She’s now a year into the job and enjoying it, though she doesn’t know whether she will stay in the field forever. “But right now I’m happy being a data curator because I feel like I can branch out,” she says. She enjoys working with several data types for several legumes, unlike the singular focus she had as a researcher. She’s also enhanced her coding skills, such as writing scripts to help her format the database consistently. 

For others who may have her same penchant for organization and want to explore unconventional ways to be part of research, Campbell can see a career trajectory in the field, too. New data will always be coming out, she says, and an employer will need someone to organize them.

Delivering the data

Programming skills can come in handy for data curating—Campbell came into the job with experience writing scripts during her Ph.D. and some knowledge of UNIX programming—but the level required can vary. Oceanographer Vicki Ferrini, who for more than 10 years has managed the Marine Geoscience Data System as a research scientist at the Lamont-Doherty Earth Observatory in Palisades, New York, doesn’t see herself as a programmer. “I’m comfortable reading and tweaking code,” says Ferrini, who used MATLAB for her Ph.D. work. “I’m not necessarily the best at inventing it from scratch, but I can crib it from other code and make it work.”

Understanding database structure can also be useful. When Ferrini first started working with databases during her postdoc, she found that she enjoyed everything about database design. “It’s exactly how my brain likes to work,” she says. Curators can also work closely with programmers who develop the curating workflows and interactive features. Knowing basic programming and database structure helps them communicate their needs and the needs of database users across disciplines, curators say. However, not every job requires these proficiencies, so interested researchers should check with the databanks in their discipline for the skills they require.

Knowledge of the database’s field is important for understanding the data. Biologist Leonore Reiser, curator for The Arabidopsis Information Resource (TAIR), says some groups train undergrads to handle some of the more routine curating tasks, such as triaging the papers to be included in the database or annotating the features in a DNA sequence. “But there’s always somebody with a Ph.D. in a supervisory role to develop the pipeline and validate people’s work,” she says. (In some cases, a master’s degree may be sufficient.) The Ph.D. training is also important when interacting with the research community and thinking about how the database can serve researchers better, she notes. “You have to be able to talk to other scientists and really understand what the problems are.”

Curators constantly interact with researchers. They correspond with the data’s authors for information, answer database users’ questions, and hold workshops to promote the database. Curators also talk to each other to collaborate, such as linking entries between their databases when appropriate. So, strong communication skills are an asset. In his 7 years as a data curator for the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB-PDB), chemist Luigi di Costanzo picked up tricks such as changing the subject line in each email reply to relay his messages as efficiently as possible. As he honed his communication skills, he discovered a new passion: communicating science to the general audience. He now regularly contributes ideas to RCSB-PDB’s public outreach blog and judges at science festivals.

Although curators aren’t producing the data themselves, they are still plugged into the latest science. “I can see how the field is evolving right there right now right at the moment,” says di Costanzo, who was a depositor to the databank himself before he became a curator. He sees newly rendered protein structures before they’re published, and the researchers’ lab notes offer behind-the-scenes looks at the work required to get the structures. The role has also offered an interesting way to follow the careers of former colleagues, particularly as he is starting to see grad students from his postdoc lab transition to principal investigators (PIs). “Just last month, one of these students who’s now a professor just deposited a structure from his own lab,” which was a fun moment, he says.

Di Costanzo also contributes to peer-reviewed articles about the database’s new tools and studies the overall impact of the database through citations in textbooks. But these activities don’t quite take the place of doing research. “I am a chemist at heart,” di Costanzo says, and he sometimes misses handling lab equipment and running late-night experiments at research facilities. Nevertheless, being part of the databank is a “great thrill,” he says. When his former colleagues ask him why he doesn’t look for a job as a PI, he replies, “Oh, I work for the Protein Data Bank. I’m very satisfied with my job.”

Many data archives are supported by grants, so curating jobs can come with the threat of instability. Campbell’s curating position, for example, is only funded for 2-and-a-half years. The PIs of the project hope the grant will be renewed, but, Campbell says, “I have to be realistic and know that it might not.” Di Costanzo feels steadier. The grant that supports his database is for 5 years, he says. Within the grant cycle, “as long as the funding is guaranteed, and I keep doing a good job and meet expectations of the organization, I have no pressure for keeping my job,” he notes. Discussions for sustained support of databanks are ongoing. But all sectors, including industry and nonprofit, employ data curators, so financial footing of the position can vary.

In the end, curators get satisfaction from knowing that their work is valuable to other researchers. When Reiser joined the Carnegie Institute after her postdoc in 1999 to help launch TAIR, “I wanted to create something that as a graduate student I would have loved to have,” she says. She came full circle this year at San Francisco’s March for Science. During the march, a group of plant biology grad students called out to her and her colleagues, “Wait, you guys are from TAIR?” Then, Reiser recalls, they exclaimed, “We love TAIR. It’s great!” “That just felt so genuine and so wonderful,” Reiser says, “like, yes, that’s why I did it. I did it for you.”

An earlier version inaccurately suggested that TAIR has an undergraduate training component. The text has been updated to reflect the interviewee’s comment.

Innovative Training for Biomedical Technology

Today, Science and Business Mix Better Than Ever!