Neglect Causes Massive Loss of ‘Irreplaceable’ Research Data

WWDC 2011

Research data kept by scientists tends to disappear ‘surprisingly’ quickly.

Research scientists could learn an important thing or two from computer scientists, according to a new study showing that data underpinning even groundbreaking research tends to disappear over time.

Researchers also disappear, though more slowly and only in terms of the email addresses and the other public contact methods that other scientists would normally use to contact them.

Almost all the data supporting studies published during the past two years is still available, as are at least some of the researchers, according to a study published Dec. 19 in the journal Current Biology.

The odds that supporting data is still available for studies published between 2 years and 22 years ago drops 17 percent every year after the first two. The odds of finding a working email address for the first, last or corresponding author of a paper also dropped 7 percent per year, according to the study, which examined the state of data from 516 studies between 2 years and 22 years old.

Having data available from an original study is critical for other scientists wanting to confirm, replicate or build on previous research – goals that are core parts of the evolutionary, usually self-correcting dynamic of the scientific method on which nearly all modern research is based.

No matter how invested in their own work, scientists appear to be “poor stewards” of their own work, the study concluded.

The most common reasons data became unavailable were broken email systems and obsolete storage formats or technology, according to the study. “I don’t think anybody expects to easily obtain data from a 50-year-old paper, but to find that almost all the datasets are gone at 20 years was a bit of a surprise,” lead author Tim Vines of the University of British Columbia told The Telegraph for a story posted Dec. 19. (Information supporting the story is, presumably, still available, though the Telegraph did not include confirmation that it is.)

Losing, or allowing data from publicly funded research become unavailable, is tantamount to allowing the random destruction of public property, especially in studies that generate large volumes of data that are expensive or difficult to recreate, or that are unique to a time and place and are therefore irreplaceable, he said.

Though it violates the assumptions of open data-sharing among researchers, many scientists turn out to be unwilling to share data for competitive or financial reasons, in addition to those unable to provide data due to poor record-keeping or other failings, according to the report.

Government- and non-governmental agencies are increasingly demanding that researchers provide them with copies of the data underlying studies they have funded; some journals are doing the same for studies they publish, and for the same reasons, the researchers found.

In February 2013, the Obama administration directed federal agencies that spend more than $100 million per year on research and development to make their results freely available to the public within a year after publication “and requiring researchers to better account for and manage digital data resulting from federally funded scientific research.”

The resulting policy statement (PDF) excepts some data on privacy or national security grounds, however, and doesn’t define a single repository for the data, such as the National Archives.

The National Institutes of Health has been developing policies requiring public access to data and research since at least 2005, with caveats and exceptions for patented information, privacy and other concerns.

Those efforts and others like them are far from universal and not consistently effective, however, according to Vines and the other authors, who call for stronger data-retention policies from funding agencies, as well as legislation classifying data from publicly funded research as public property that should be protected.

“Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives,” the study reads.

 

Image: Shutterstock.com/Lightspring

Post a Comment

Your email address will not be published.