Spring 2016 - Safety

Irreproducible Research: The Need for Study Validation

Changes are needed in the way scientific research is currently conducted to ensure its legitimacy and efficacy.

MODERN SCIENCE is facing what could be the gravest threat to its remarkable run of success — a run that extends from the beginning of the Scientific Revolution some four and a half centuries ago. The issue is the “reproducibility crisis” in which a significant percentage of scientific studies are unable to be reproduced by others, not only undermining their value but also threatening the public’s faith in scientific research as a whole.

Many explanations are being offered as both the popular and academic press take an increasingly skeptical look at the issue. The size of the problem is not yet clearly known, nor even agreed upon. Nor are the root causes or potential solutions matters even close to consensus. But what nobody seems to be disputing is the reality that a huge swath of scientific research — including that being conducted in many medical fields — is basically useless. More ominously, it is also not widely disputed that the reproducibility crisis threatens the very foundations of the scientific culture in the West, upon which rests modern medicine.

Many science and medical writers are warning that the reproducibility crisis undercuts the very premise of the scientific method: the idea that science is ultimately based on facts that can be proven through observation and/or experimentation. If much of that experimentation is so flawed as to prove nothing, or if the analysis of the observations is statistically meaningless, then our science is far less efficient and effective than we have all believed.

How It’s Supposed to Work

Everyone working in the medical sciences is well aware of how the scientific method works: Let the evidence guide you where it will. Either a drug works against a disease, or it does not. A treatment leads to improvement, or it does not.

A properly designed study will account for all outside influences, and it will compare a control group vs. a study group. Further, in most reputable journals, no studies can be published until they’ve been reviewed by other experts in the same discipline. This “peer review” process is designed to be the main defense against fraud and incompetence. These peers should be reviewing a study to ensure it was properly configured to account for all variables so that the experiments truly indicate whether the drug or treatment being tested is effective. They should also be weighing whether any conclusions tied to the study are truly reflective of the evidence offered.

But it is this entire process that is under heightened scrutiny as hundreds, and potentially thousands or more, of research papers are being found wanting in one respect or another.

The Scientific Challenges

In the last few years, researchers from a variety of disciplines discovered that while reviewing previous work in their fields, they were unable to get the same results as reported in published papers. Either primary research or statistical conclusions drawn from existing research were found to be impossible to reproduce. Some researchers have been worried about this phenomenon for a while, even if the issue is only now gaining traction in the academic press.

The advocacy nonprofit Public Library of Science (PLOS) published an article 11 years ago arguing that statistical claims associated with research results were often invalid or misleading. Its author, John P. A. Ioannidis, a statistician and physician on faculty at Stanford, argues that most often it is not fraud that leads to bad studies, but the research culture itself. His belief is that out-and-out fraud is but a small sliver of bad research; instead, secrecy and misapplication of statistics theory are the main culprits.1 (Outright fraud is easier to detect and confront than systemic bias. There is even an entire website devoted to reporting on retraction of studies at retractionwatch.com.) Roger Peng, associate professor in the department of biostatistics at Johns Hopkins Bloomberg School of Public Health, backs up Ioannidis’ assertion that statistics illiteracy clouds far too many conclusions drawn from research.2

Statistical error may come from not understanding how to properly account for anomalies that arise during the course of an experiment. For instance, PLOS is conducting an analysis of how researchers account (or fail to account) for rodents that die from seemingly unrelated issues during the course of medical experiments.3 An earlier review of experiments that utilized rodents found a significant number had a smaller number of test subjects at the conclusion of the research than at the beginning, with no accounting for this discrepancy.

Beyond errors that escape peer review are the pressures associated with the highly competitive, cutthroat world of academic research in the United States and Western Europe. With every faculty member and tenure-track adjunct having to conduct original research in order to get or keep a job in their chosen field — or simply to finish their doctorate program — the pressure to conduct original research is enormous. As noted in 2013 by Fiona Fidler and Ascelin Gordon on the Phys.org website, “There can be little doubt that the ‘publish or perish’ research environment fuels this fire. Funding bodies and academic journals that value ‘novelty’ over replication deserve blame too.”4

The growing number of participants in science and academia only adds to these pressures. There are far more working-class and middle-class students qualifying for and being accepted into four-year universities today than was true a half-century ago — a 300 percent increase in the number of college and university students in the United States, far outstripping population growth.5 The United States alone has gone from turning out an average of 545 science and engineering doctorates a year in the 1920s to more than 27,000 in 2010.6

While this growth has provided an undoubted burst of democratization to what was formerly a preserve of the rich, we now also have more and more researchers fighting for grant money and tenure-track positions, increasing competitive pressures. Just in the years from 1997 to 2011, funding requests to the National Institutes of Health doubled, from 31,000 research grant applications to 62,000.6

The result of more people competing for the same resources is probably predictable. Chris Chambers, professor of psychology and neuroscience at Cardiff University in Britain, said “significance chasing” — aiming for the highest perceived level of interest in order to attract more attention and funding — is inadvertently leading to bad research.7 (Ironically, his March 2015 comments came at a conference at University College London just a week before BioMed Central, a major publisher of medical journals, announced the retraction of 43 published papers for reasons of peer review fraud.8 It was a BioMed Central blog that quoted his comments made at the conference.)

The anonymous “Neuroskeptic” blogger for Discover magazine, as well as one of his regular readers responding in the comments section of his post, wonders if the fact that most academic and scientific journals publish only “significant” results doesn’t also add to the pressure through what Neuroskeptic refers to as “publication bias.” Neuroskeptic also worries about “p-hacking,” often referred to as data dredging, in which existing data is automatically (via computer algorithms) searched for statistical anomalies. Rather than searching for evidence to back up or dispute an existing hypothesis, the patterns themselves are the subject of the search — and once found, hypotheses are then developed to explain them.9

All of these different pressures — to get published, to stand out from other researchers, to secure funding — are likely introducing unintended bias into the conclusions reached, if not the very research itself. And yet, just as the ability to reproduce a study would seem to be more important than ever, these same pressures faced by academics are leading many of them to show great reluctance to share details of their research. This protects their intellectual work, but also makes it near impossible for others to replicate their work. As Fidler and Gordon pointed out in their Phys.org commentary, “Data sharing and other procedures outlined here can be time-consuming and currently provide little academic reward.”4

The Cultural Challenge

Coverage of these issues has begun leaking over to the mainstream press. The first big waves in the media came in 2012 when pharmaceutical researchers C. Glenn Begley and Lee Ellis dove into 53 supposedly groundbreaking oncology studies from 2001 to 2011, and could only reproduce 11 percent.10 Drug companies took notice, as they were pouring billions of dollars in private research money into new studies designed to build on the results of earlier — suddenly questionable — studies.

More reports about irreproducible research followed in the popular media. Time magazine weighed in on the issue in 2014,11 as did Wired.12 Last July, the popular science blog I09 published a piece titled “Half of Biomedical Research Studies Don’t Stand Up to Scrutiny.”13 Smithsonian magazine addressed it earlier this year.14

These reports might not sound as ominous as the more rigorous scientific papers, but public support for the sciences is essential to preserving or even increasing government funding of research. Most funding for basic scientific research comes from governments, with grants awarded based on a combination of past success and the potential for gains in new knowledge.

It is a shared cultural belief in the ability of science to provide important advances in our understanding of the universe, and apply those new insights in ways that improve our quality of life, that makes it possible for the government to invest heavily in scientific research — particularly medical research. Threaten that belief in the legitimacy and efficacy of scientific research, and you threaten public funding. Without voter support for government funding, said funding cannot survive — particularly in democratic systems, and certainly not at the levels to which we have become accustomed.

Finding Solutions

It is likely that further study of the issue is needed before a consensus emerges on the scope and nature of the problem. And without a broader consensus, it will be difficult to change the overarching culture of scientific research (including the disinclination to full disclosure of study parameters).

Still, changes in the way research is conducted — or at least funded and published — are already underway. The National Institutes of Health (NIH) has beefed up its grant application process, requiring more explanation of the science behind the proposed study and more rigorous efforts to eliminate variables.15 This, of course, applies only to medical studies, but the NIH standards will encourage other government agencies to at least take notice.

As gatekeepers of information about studies, many medical and academic journals are changing the way they accept papers for publication. PLOS now requires full disclosure of all data before it will publish any research studies.16 And Nature magazine is offering data repository agreements to encourage public sharing of research data among its contributors.17 Nature has gone so far as to devote an entire online hub to the topic at www.nature.com/news/reproducibility-1.17552.

And the debate on what else ought to be done continues. Neuroskeptic proposes peer review of research studies before they even begin, with journals committing to publishing the results no matter what they are.18 Peng, the Johns Hopkins biostatistician, is proposing enhanced instruction in statistics for budding researchers in all scientific disciplines to improve the quality of conclusions reached from study results.19 Fidler and Gordon suggest a reproducibility index, which they argue would require more sober statistical analysis of research results. They also propose that researchers share their computer code — their search algorithms — along with the data used in the study being reported on, so that others can provide a true “apples-to-apples” comparison.4

It will likely take a combination of all these proposals to begin changing the culture of medical and scientific research. But with so many billions of dollars at stake in both private and public research funds, the current uproar over the “reproducibility crisis” is unlikely to lessen until system reforms are put in place.

 

References

  1. Ioannidis JPA. Why Most Published Research Findings Are False. PLOS Medicine, Aug. 30, 2005, Accessed at journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124#pmed-0020124-b6.
  2. Peng RD. The Reproducibility Crisis in Science: A Statistical Counterattack. Significance, June 2015. Accessed at onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2015.00827.x/abstract.
  3. Inglis-Arkell E. Rodents That Go Missing in Scientific Papers Can Skew Results. Gizmodo, Jan. 4, 2016. Accessed at gizmodo.com/rodents-that-go-missing-in-scientific-papers-canskew-r-1750891512.
  4. Fidler F and Gordon A. Science Is in a Reproducibility Crisis: How Do We Resolve It? Phys.Org, Sept. 20, 2013. Accessed at phys.org/news/2013-09-science-crisis.html.
  5. Brock T. The Changing Landscape of Higher Education: 1965-2005. The Future of Children, Spring 2010. Accessed at futureofchildren.org/publications/journals/article/index.xml?journalid=72&articleid=523&sectionid=3589.
  6. Howard DJ and Laird FN. The New Normal in Funding University Science. Issues in Science and Technology, Fall 2013. Accessed at issues.org/30-1/the-new-normal-in-funding-universityscience/8.
  7. Bal L. Is Science Broken? The Reproducibility Crisis. BioMed Central, March 20, 2015. Accessed at blogs.biomedcentral.com/on-biology/2015/03/20/is-science-broken-a-reproducibility-crisis.
  8. Barbash F. Major Publisher Retracts 43 Scientific Papers Amid Wider Fake Peer-Review Scandal. The Washington Post, March 27, 2015. Accessed at www.washingtonpost.com/news/morning-mix/wp/2015/03/27/fabricated-peer-reviews-prompt-scientific-journal-toretract-43-papers-systematic-scheme-may-affect-other-journals.
  9. Neuroskeptic. Reproducibility Crisis: The Plot Thickens. Discover, Nov. 10, 2015. Accessed at blogs.discovermagazine.com/neuroskeptic/2015/11/10/reproducibility-crisis-the-plot-thickens.
  10. Begley CG and Ellis LM. Drug Development: Raise Standards for Preclinical Cancer Research. Nature, March 29, 2012. Accessed at www.nature.com/nature/journal/v483/n7391/ full/483531a.html.
  11. Baldwin M. Is the Peer Review Process for Scientific Papers Broken? Time, April 29, 2014. Accessed at time.com/81388/is-the-peer-review-process-for-scientific-papers-bken.
  12. Scientific Peer Review Is Broken. We’re Fighting to Fix It with Anonymity. Wired, Dec. 10, 2014. Accessed at www.wired.com/2014/12/pubpeer-fights-for-anonymity.
  13. Oransky I. Half of Biomedical Research Studies Don’t Stand Up to Scrutiny. I09, July 29, 2015. Accessed at io9.gizmodo.com/half-of-biomedical-research-studies-dont-stand-up-tosc-1720835208.
  14. Hoffman A. Biomedical Science Studies Are Shockingly Hard to Reproduce. Smithsonian, Jan. 4, 2014. Accessed at www.smithsonianmag.com/science-nature/biomedical-sciencestudies-are-shockingly-hard-reproduce-180957708/?no-ist.
  15. Repetitive Flaws. Nature, Jan. 20, 2016. Accessed at www.nature.com/news/repetitiveflaws-1.19192.
  16. Bloom B. PLOS’ New Data Policy: Part Two. PLOS, March 8, 2014. Accessed at blogs.plos.org/everyone/2014/03/08/plos-new-data-policy-public-access-data.
  17. Data-Access Practices Strengthened. Nature, Nov. 19, 2014. Accessed at www.nature.com/news/data-access-practices-strengthened-1.16370.
  18. Neuroskeptic. Fixing Science’s Chinese Wall. Discover, Dec. 21, 2013. Accessed at blogs.discovermagazine.com/neuroskeptic/2013/12/21/judging-science/#.Vq7hKnarTC0.
  19. American Statistical Association. Roadmap to Fight Reproducibility Crisis. Science Daily, June 16, 2015. Accessed at www.sciencedaily.com/releases/2015/06/150616123914.htm.
Jim Trageser
Jim Trageser is a freelance journalist in the San Diego, Calif., area.