Nature Methods on “big data” and the scientific method

The rise of ‘omics’ methods and data-driven research presents new possibilities for discovery but also stimulates disagreement over how science should be conducted and even how it should be defined. Is the ability of these methods to amass extraordinary amounts of data altering the nature of scientific inquiry? These are the issues dicussed in the April Editorial of Nature Methods (6, 237; 2009).

“Methodological developments are now making it possible to obtain massive amounts of ‘omics’ data on a variety of biological constituents. These immense datasets allow biologists to generate useful predictions (for example, gene-finding and function or protein structure and function) using machine learning and statistics that do not take into account the underlying mechanisms that dictate design and function—considerations that would form the basis of a traditional hypothesis.

Now that the bias against data-driven investigation has weakened, the desire to simplify ‘omics’ data reuse has led to the establishment of minimal information requirements for different types of primary data. The hope is that this will allow new analyses and predictions using aggregated data from disparate experiments.”

The Editorial goes on to ask whether the generation of parts lists and correlations in the absence of functional models is, in fact, science? “Based on the often accepted definition of the scientific method, the answer would be a qualified no. But the rise of methodologies that generate massive amounts of data does not dictate that biology should be data-driven. In a return to hypothesis-driven research, systems biologists are attempting to use the same ‘omics’ methods to generate data for use in quantitative biological models. Hypotheses are needed before data collection because model-driven quantitative analyses require rich dynamic data collected under defined conditions and stimuli.

Correlations in large datasets may be able to provide some useful answers, but not all of them: ‘omics’ data can provide information on the size and composition of biological entities and thus determine the boundaries of the problem at hand. Biologists can then proceed to investigate function using classical hypothesis-driven experiments. It is still unclear whether even this marriage of the two methods will deliver a complete understanding of biology, but it arguably has a better chance than either method on its own.”

Comment on this Editorial at Nature Methods’ Methagora blog.


Comments are closed.