Your Data, Warts and All

Laboratory science can be complicated, dominated by messy data, negative results, and partial answers. This makes communicating science a difficult exercise, all the more because of the powerful pressure to publish—and to publish beautiful results—especially on young scientists who aspire to have academic careers. “The pressure is very high, very early,” says Brian Nosek, who is an associate professor in the Department of Psychology at the University of Virginia in Charlottesville. These days there are “more people for fewer jobs, and because people can produce a lot of data and a lot of findings very rapidly in modern society, the pressure … to get more and more positive results, more and more clean results, it’s only increasing.”

But there’s another source of pressure putting early-career scientists in a vice. Comprehensive, transparent disclosure of scientific results has become possible only in the last few years—thanks to the Internet—and due to new scientific approaches and new insights, it has, over the same period, become essential.

Career pressures and transparent reporting

It’s difficult to quantify the problem, but the pressure to publish encourages many scientists—either consciously or unconsciously—to compromise the reporting of their research.

The current system also favors publishing a certain kind of results. Most of the data scientists produce in the lab is messy and uncertain—but “publishing … primarily supports getting positive results, innovative results, and very clean and tidy results,” Nosek says. This makes researchers “very highly motivated, in order to have a successful career, to make it look nicer than it really is.”

Some researchers may, for example, mine their data selectively, remove outliers or tinker with statistical methods until a significant result shows up. Biology and social sciences, especially, lend themselves to “trying out all sorts of different tests and ways to analyze your data, and then only reporting the best result,” Fanelli says. “A lot of people do this without even realizing how this then statistically biases the likelihood that what you’re showing is real.”  

As for negative results, few journals are interested in them. That’s too bad because “If we do not report the various studies that … didn’t work or didn’t come out the way we thought they would, and only report the ones that were in line with what we thought was going to occur or what … existing theories think, then we are ourselves biasing the literature by only reporting a subset” of what we found, Nosek says.

Methodologies, too, are often sloppily described. Some papers are unclear about whether a clinical trial was randomized or not, do not include controls, or do not describe the statistical analysis adequately, says Virginia Barbour, who is the medicine editorial director at PLOS in Brisbane, Australia and chair of the Committee on Publication Ethics. “Most of the time, if you really want to reproduce the results from a paper, you’re probably going to have to call up the PI [principal investigator] of the paper and ask them for more detailed information,” Barbour says. But “If you read a recipe, you don’t expect to have to ring up the person who wrote the cook book and say, actually I couldn’t really figure that bit out.” Also, research papers are often written in a way that amplifies the importance of the results, she adds. “That happens all the time.”

Improved practice

The career-related pressures are real, but scientists can (and should) choose not to succumb. Fortunately there’s a new emphasis on openness in the reporting of scientific results, mitigating ,somewhat, these unhealthy career pressures. In the meantime, there are things early-career researchers can do to balance scientific and career imperatives—some easy, others a little more daring.

Researchers are often tempted to rush through the methods section so they can spend more time and space on the results. But for readers, that works poorly: ” [I]f you don’t know the methods you can’t really decide if the results are valid,” Wager says. “The traditional advice is that there should be enough detail that somebody else can repeat the experiment.” That’s easier nowadays, Wager adds, since journals now provide space for supplementary materials. So now you can publish an expanded, more detailed methods section as an appendix to the main paper.

Indeed, online supplementary materials allow you to go further by presenting a more comprehensive context for your core results, including ancillary data. If, for example, a set of genetic sequences is essential to reproducing the work, you can make it available through a freely accessible database like Genbank, Barbour says.

Your statistical analysis should be fully disclosed and explained. “If you’ve used some special statistical method or some method for accounting for things, you’ve got to say exactly what you did,” Wager says.  

It is just as important to say why you did it that way. “If you can say, for example, that the method was prespecified, that’s what you planned to do even before you saw the results, that is a good thing,” Wager says. If you chose a method after the fact, explain and justify the choice. The goal is to convince readers that your choice of methods is responsible and appropriate—and not opportunistic.

Real-life science is messy. There are almost always outliers, for a wide range of reasons, and they often give scientists headaches. But don’t even consider hiding them. “If you start taking data points out, that is highly problematic and that amounts to misreporting of data,” Barbour says. “If your data is messy, it may well be that you just need to repeat it a number of times until you are sure that what you’re presenting is accurate,” she adds. If those outliers persist, report and discuss them. “If you think a particular variable or a particular data point should be removed, you should be able to publicly defend why … and report what the implications are for leaving it in versus taking it out. And then let the reader decide,” Nosek says.

It is also important to reveal all of your experiments, especially if you did 51 of them and only one turned out to be statistically significant. “If you really think that this is a valuable result to publish, tell us about the 50 experiments as well,” Fanelli says. “It still might be right,” but other scientists need to know about all of it so that they can make their own judgments, Fanelli says. Reporting only the statistically significant result and hiding the rest “borderlines into actual fraud.”

What makes science communication so difficult is the need to balance all this disclosure and complexity with a clear story that people can follow. “A good compromise would be … presenting the pretty result in the main text, but having an appendix where everything that might make the result look less pretty is reported,” Fanelli says. Likewise, save your report of everything you tried in your analysis that didn’t work for the supplementary materials. “No one wants to read fifteen robustness tests in the main text, but someone who’s really interested in replicating the results will” look at the appendix, Fanelli adds. Signposting complexities in the main text might also help readers navigate between the two documents.

A hundred years ago, journal space was precious. There just wasn’t any room for multiple data sets—or even one data set—in a scientific journal article. The Internet has solved that problem. Data, materials, and sometimes even statistical packages may now be hosted on a journal’s Web site, or an institution’s. “Those files are not always peer-reviewed and it’s up to the author what to show … , but it’s good if you do have large results or you want to show maybe even multimedia,” says Wager.

Of course, it’s often difficult to figure out the main story. “The simplest answer is to try and tell the story that you think your data and findings can best defend,” Nosek says. Disclosure has this advantage: If you reveal it all, you can feel more comfortable making strong claims. “You can present a strong hypothesis, talk about how the findings are consistent with that, and then point out very openly where the data were not consistent with it, and that’s a question for future research.”

Negative results rarely leave the lab. But, “especially in the biomedical and the clinical field, I think it’s really important that negative results get published,” Wager says. Partly this is because negative results can prove important in any subsequent meta-analysis, Wager says, and partly it’s to allow other researchers to avoid wasting valuable time and resources trying what you already tried. New journals are emerging, most of them open access, that welcome publishing negative results. , for example, judges the soundness of submitted papers rather than their likely impact, Barbour says. Such papers “may not win you many citations … but in terms of how science progresses it is a good principle” to publish negative results, Wager says.

Nosek believes that “in the current incentive structure of science, it’s a lot to ask young scientists to just go ahead and write it up anyway, because it’s not very likely to get accepted, it’s not going to be a very high-impact publication.” That’s one of the reasons that he advocates an even more radical form of disclosure, where scientists make all their results and materials available online via initiatives like his Open Science Framework. While putting your data online won’t win you a publication, making your results discoverable in this way is not hard to do, Nosek says, and has advantages both for you and for science.

Done well, writing up the results of your research is a big job. Over the last decade, many reporting guidelines, some including checklists, have become available to address a range of challenges. Our sources also suggest seeking advice from colleagues and advisers—but do it critically. Many of them came of age in a time when reporting constraints made full disclosure difficult.

A balancing act

Some of the ways researchers can improve their reporting of science are easy to implement. “Just writing the methods is not going to take you more than a day or so, you know, it’s not going to hold you back,” Wager says.

Other healthy practices carry some career-related risk. If, for example, you save your data until it “make[s] a whole story, you’re perhaps more likely to get it in a higher impact journal,” Wager says. But in a competitive fast-moving area there is a lot of pressure to submit early and often, even if the data and the story aren’t quite ready yet. “Should I work on this a bit more, or should I publish it now? And that’s a difficult balance,” Wager says. If your results turn out to be wrong, your reputation and career will suffer from your haste. In “the long term, what you’re aiming for, really, is, the research that you publish is something that people can rely on,” Barbour says.  

Most scientists recognize the value to science of openness and reproducibility, but for now the career risks are real and the career benefits are unclear. On one side, young scientists can build their reputations by sharing their materials, data, analysis, and research process. “When all of those things are available for others to scrutinize, then they will be able to see, ‘Wow, this is work that is done really well. Yep, it’s a little bit messier than some of those other articles that we observed, but I can be much more confident in what I see reported,’” Nosek argues. But while young scientists are likely to benefit from such an approach in the long term, there are risks. “I want to be optimistic,” Nosek says, but “the pay-offs are not that direct.” It’s hard to say right now with confidence that real benefits will appear in the future.

Ultimately, “At this moment in time, in this form of publication system and scientific system, if you want to make a career, you need to play the game a bit, so you need to sell your result and so on. They would be naïve to say, ‘don’t do that,’” Fanelli says. But there are ways to compromise on this. “You do present a pretty, neat, short, to-the-point paper, but you give space somewhere for the not-so-pretty bits to be available,” Fanelli adds. If people who are interested in your work “can easily reach a page where they do have access to … everything you didn’t have space to put in the published paper—there you go, problem solved.”

Anderson MS, Ronning EA, De Vries R, Martinson BC (2007) The perverse effects of competition on scientists’ work and relationships. Science and Engineering Ethics 13: 437–461. doi: . by Cheryl Kaiser , and (COPE)

What Does Winning a Major Prize Do to Productivity?

The Risks and Rewards of Academia