Everybody agrees that ensuring the integrity and accessibility of research data is crucial for scientific progress. Agreeing on the best way to do so is the hard part, says Nature Medicine in its February Editorial ( 16, 131; 2010).
Technological advances have enabled researchers to tackle questions that involve generating vast amounts of data, posing challenges concerning data analysis, manipulation, annotation, sharing and storage that researchers, institutions, funders and journals have not yet fully grasped. How should data be annotated before being stored in a database so that it can be as useful as possible to other researchers? Should data-sharing requirements be extended to the computer codes that were used to analyze the data? Who should have access to the data, and who pays for data storage and management?
These questions will become more pressing as further technological advances make it even easier to produce ever larger data sets, and it won’t be simple come up with the answers. The US National Academy of Sciences, the National Academy of Engineering and the Institute of Medicine published a report called Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age, focusing on data integrity, access and long-term preservation, and providing a useful framework around which to organize what has become an urgent dialogue.
The conclusions of the report, while worthy, are hardly news to those who have pondered these issues. At the Nature journals, for example, data sharing has long been a requirement for publication, and the editors directly insist to authors that they must fulfill their commitment to sharing when other researchers have reported difficulty in obtaining data and materials. The merit of the acadamies’ report does not lie in its recommendations but in its disciplined analysis of the current state of play, its multidisciplinary perspective on the problems and its identification of the tough questions that scientists, institutions, funders and journals need to answer to move forward, even though it provides little in terms of answers.
The Editorial goes on to further analyse the issues raised by the report, concluding that scientists themselves should develop the right standards, lobby for the resources to set up the appropriate infrastructure and decide on the right measures to deter other scientists from data mismanagement. Data may not be the legal property of scientists, but looking after the data is certainly their responsibility.