Monday, March 12, 2012

Bioinformatics: A Gentle Overview

At one city in France the  great Louis Pasteur (1822-1895) was studying how fermentation of alcohol was  linked to the existence of a specific microorganism. In another city in England, equally great Charles Babbage (1791-1871) was oiling his Analytical Engine in which Ada Lovelace, a mathematician who understood Babbage's vision, was trying to  calculate the Bernoulli numbers. These gentlemen are today hailed as father of biotechnology and father of computers respectively. Did Pasteur and Babbage ever meet ? They had about 25 years to do so, and were less than 1000 Km apart. We do not know if they ever met, but had they met, they possibly would not have talked to each other ! If I may be pardoned for a politically incorrect pun, remember that Pasteur was French and Babbage was British !. Anyway, what do they have in common to talk, other than the weather? What is there in common between the gear  wheels that were turning away in an attempt to crunch numbers and the microbes playing mysterious role in fermenting alcohol ?  
Is this true today ? Not a bit, not even as much as a bacteria. It seems imminent, if  not already true, that Biology and Computers are becoming close cousins which are mutually respecting, helping and influencing each other and synergistically merging,  more than ever. The flood of data from Biology, mainly in the form of DNA, RNA and  Protein sequences, is putting heavy demand on computers and computational  scientists. At the same time, it is demanding a transformation of basic ethos of  biological sciences. A common misconception is that bio-informatics is about  creating and managing bio-data bases. Nothing would be farther from the truth. Fine  analytical and engineering skills are in great demand in the area, as seen by vigorous  attempts of machine-learning on the protein folding and gene-finding problems. The great Donald Kunth, renowned Stanford computer science professor, is quoted  often for pointing out that biology has 500 years of exciting problems to work on. He feels that biology is "so digital, and incredibly complicated, but incredibly useful"(Computer Literacy Interview with Donald Knuth by Dan Doernberg, December  1993). However, there are still some spokes in the wheel for the grand union  between two great sciences and their offshoot technologies. Due to the  estrangement which existed for many decades, professionals from both the fields  have a lot to do in terms of fine tuning their communication. Skepticism from  puritans in both fields towards the claim of Bioinformatics as an independent field  also needs convincing answers.  
Many universities world over have started teaching and research in the area. Journals are plenty and so are conferences and professional meetings. As the disciplines of  bioinformatics and computational biology are gaining prominence day by day, an industry is also emerging fast on their shoulders, estimated at $1.82 billion in 2007.  Bioinformatics has taken on a new glitter by entering the field of drug discovery in a  big way. Bioinformatics has taken on a new glitter by entering the field of drug  discovery in a big way. This is one area that seems to be becoming the single largest. bioinformatics application, from an Industry view point. In India, it has a  special relevance in the context of the recent patent amendment that has brought in  product patents.  
There has been a green-shift in all prominent technology publications. IEEE, the world's biggest professional society of technologists,  has prominently adopted such a shift. I did a quick check. If you use the key word "biology" and search the IEEE Digital Library limiting the year of search, you get the following hits for the years indicated in brackets: 13 (1975), 40(1985), 3484 (1990),  9617 (1995), 16233 (2000) and 27526 (2006). I did this on 26 November 2006, among the 14,32,467 documents in the data base. About 2% documents have been greened! One of the latest additions to the prestigious IEEE Transactions series is  IEEE & ACM Transactions on Computational Biology and Bioinformatics. It may be  noted that biological motivation has a long history in the computer field, in the form  of artificial neural networks, genetic algorithms, to the recent ant-colony optimization  techniques. Applications of computers in biology were mostly in the bio-medical field, in early days. One new facet that has emerged with Bioinformatics, is the focus  on sub-cellular and molecular levels of Biology. Systems biology promises great  growth in modeling cellular life, using conventional engineering approach, as already pointed to by projects such as e-Cell.  
I will attempt to give the  big picture of Bioinformatics by presenting  basic ideas in minimal technical vocabulary, aimed specifically at IT community. I do not have  anything against life scientists attempting to read this and I think it could be useful in patches to   them also. They are however likely to be uncomfortable with my bio-wisdom.  
Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. A mere application   of computers to solve any problem of a biologist would not merit a separate discipline. It looks as   if Bioinformatics is mainly concerned with problems involving data  emerging from within cells of living beings. It might be appropriate to say that Bioinformatics deal with application of computers in solving problems of molecular biology, in this context.   
What is difference between Bioinformatics and Computational Biology ?   This is a bit tricky. Both are "Computers + Biology". Difference is subtle but important. Bioinformatics = Biology + Computers whereas Computational Biology = Computers + Biology. In other words, biologists who specialize in use of computational tools and systems to answer problems of biology are bioinformaticians. Computer scientists, Mathematicians, Statisticians, and Engineers who specialize in developing theories, algorithms and techniques for such tools and systems are computational biologists. Arguably, there will be overlaps, but one can also identify some clear demarcations. I am yet to find a biologist who is at absolute ease in understanding, let alone developing a hidden Markov model, which is a machine learning paradigm used extensively in Bioinformatics.