Soapbox Science

Bioinformatics what is it and how it can bring prehistory to life?


Ivan Karabaliev joined Eagle Genomics located at the Babraham Research Centre in Cambridge, UK, a bit more than a year ago and has been discovering the essence of bioinformatics. Coming from a business marketing background, Ivan likes to explain the complex world of bioinformatics to new audiences and the general public.

Explained in just one sentence, bioinformatics is the science of managing, analysing, storing and merging biological data (DNA sequences, proteins, etc.) using advanced computing techniques. Put another way, it is the application of computer science and information technologies to solve biological questions. Simple questions include asking what a specific region of given DNA is responsible for, or how closely related one organism is to another by comparing their genomes.

The genome is the entirety of an organism’s hereditary information; the genetic make-up of all living organisms. It contains the instructions needed for a living organism to grow and function. When we know the sequence of a gene, the role it has in an organism and the diseases caused by malfunctioning copies of the gene, this information can be used to improve life for the organism. This is where bioinformatics comes along, to better interpret and understand genetic messages.

The genomes of organisms, some of which can be several billion DNA base pairs long, can be stored in biological databases. The data stored may include gene function, structure, localization (both cellular and chromosomal), physiological or clinical effects of genetic mutations, as well as similarities of biological sequences and structures.

In 1990 the Human Genome Project was formally given a green light, encouraged by the need to understand and help cure human diseases – the genomic revolution started to take its first steps. The project was led by Dr. Francis Collins, head of the International Human Genome Institute. The whole human genome, which is 3 billion base pairs long, was sequenced in 2000. The news was proclaimed by Bill Clinton:

Humankind is on the verge of gaining immense, new power to heal. It will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases!

You can watch a YouTube video of the announcement here. During the announcement a very important fact was neglected: the sequence was not truly complete, but a mere first draft. About 10 percent of the human genome had not been read.

It wasn’t until 2003 that the human genome’s sequencing was officially completed. Since then, along with the constant improvement of bioinformatics, genetic investigations have enabled the development of new tests, drug targets and have given fresh insights into the basis of human disease. However, these pioneering investigations have also revealed just how complicated human biology is and how much remains to be understood.

The human genome project is a great example of the application of bioinformatics. The project stores huge amounts of genetic data in a database that analyses and maintains human genome sequences. The database is able to write complex, biologically-aware algorithms to analyse the massive amount of information and to compare it to other related data. This enables the efficient sequencing and identification of all three billion chemical units in the human genetic instruction set, helping to find the genetic roots of diseases. But, this is just one example of how bioinformatics can be used. Below is an overview of some of the other interesting applications of bioinformatics:

The Microbial Genome Project where scientists are determining the DNA sequence of C. crescentus, one of the microorganisms used for sewage treatment. Genomes of highly resistant bacteria are sequenced and analyzed to aid the waste treatment industry. Some bacteria can reduce levels of uranium in water. Other bacteria species like the Geobacter are capable of breaking down petroleum compounds so polluted waters can be treated.

• Climate change can also be aided thanks to bioinformatics. How? Well the Department of Energy in USA launched a program to decrease atmospheric carbon dioxide levels. One method of doing so is to study the genomes of microbes that use carbon dioxide as their sole carbon source.

• In the food industry, researchers anticipate that understanding the physiology and genetic make-up of Lactococcus lactis bacteria used in the dairy industry (buttermilk, yogurt, cheese, also used to prepare pickled vegetables, beer, wine and breads) will prove invaluable for food manufacturers as well as the pharmaceutical industry. Similar advances are expected in forensic science where bioinformatics tools are used to compare crime-scene samples to existing databases to see if they are present there or if they are related to other microbes.

• Another and potentially controversial application of bioinformatics is in defence. Scientists have built the virus poliomyelitis using entirely artificial means. They did this using genomic data available on the Internet and materials from a mail-order chemical supply. The research was financed by the US Department of Defence as part of a biowarfare response program to prove to the world the reality of bioweapons. The researchers also hope their work will discourage officials from ever relaxing programs of immunization.

In agriculture, sequencing of the genomes of plants and animals has enormous benefits for the field. Bioinformatics tools are used to search for potentially useful genes within these genomes and to elucidate their functions. The gathered genetic knowledge could then be used to produce stronger, more drought-, disease- and insect-resistant crops, or to improve the quality of livestock making them healthier, more disease-resistant and more productive.

Future uses of bioinformatics

• Medicine will become more personalised with the development of the field of pharmacogenomics, which is the study of how an individual’s genetic make-up affects the body’s response to drugs. At present, many drugs fail to make it to the market because a small percentage of patients show adverse affects to a drug often due to sequence variants in their DNA.

• Enhancement of gene therapies. Gene therapy is the approach used to treat , cure or even prevent disease by changing the expression of a person’s gene. Currently this field is in its infancy. There are currently many ongoing clinical trials for different types of cancer and other diseases.

• And finally my favourite example for potential use of bioinformatics is in sequencing dinosaur DNA. Remember Spielberg’s movie Jurassic Park based on the book by Michael Crichton? Scientist Mark Boguski read the book and decided to do a simple experiment to replicate the movie’s premise of dinosaur DNA having been preserved inside an amber-encased mosquito. He found out that the genetic sequence quoted in the book and movie had nothing to do with dinosaurs, so he wrote a journal article about his findings. Crichton came across this manuscript and approached Boguski to provide him with a real DNA sequence for his second book: The Lost World. (Read the full story here.) This is the actual paper where Boguski wrote his findings:


Bioinformatics isn’t going to replace lab experiments any time soon. For now it is best used to help “focus” and complement scientific research. In most cases, bioinformatics helps to eliminate false positives, saving time and money pursuing false leads. However, with the ever-increasing volumes of data, bioinformatics has become an important part of all genomic research projects and the future is bright. As developments in genomic and molecular research technologies improve, in line with developments in information technology, bioinformatics is becoming a major player in the understanding of biological processes and disease.


There are currently no comments.