CAMBAM seminar speaker: Dr. Sandrine Dudoit
In our last seminar talk, we had the chance to host Sandrine Dudoit, from the University of California, Berkeley. As a biologist turned bioinformatician and working with RNA-Seq data myself, I was very excited to attend her talk. And I was not disappointed.
High-throughput sequencing technologies are rapidly taking over the field of genomics, thanks to the improvements in quality and cost of sequencing experiments. And, of course, the fact that they are relatively easy to understand by the general public facilitates funding. However, such studies may turn out to be more complex than anticipated…
In her talk, Dr. Dudoit outlined very nicely some of the computational and statistical challenges of these technologies. She covered issues from the definition of a biological question and experimental design to the actual data analysis and validation. Focusing on experimental design and higher level data analysis, Dr. Dudoit presented a thorough study of the problems than can beset this technology. Her results show how much more work is needed in several aspects, including controls, normalization and accounting for the complexity and heterogeneity of biological data.

Dr. Dudoit and Dr. Glass conversing - Bioinformatics meets Mathematical Biology. Photo: Juli Atherton
After the talk, I bothered her with a short interview by e-mail. When asked about the challenges she faces as a statistician speaking to the biological community, she answered:
In spite of the increasing volume of research at the interface between the biological and mathematical sciences, I believe there are still significant communication hurdles between the two fields. Rarely does a single individual possess sufficient depth in biology, statistics, and computing to meet the challenging questions routinely encountered in computational biology. Such questions are best addressed by interdisciplinary teams of experts in each of the relevant fields. Effective communication between biological and mathematical scientists is therefore key to the success of this enterprise. A common language and mutual understanding between biological and mathematical scientists can be achieved only if mathematical scientists acquire proper minimal training in biology and vice versa.
I was also curious to hear her opinion about the enormous amount of biological data that is currently being produced. Is it followed by good development of analytical tools? Which are the areas of mathematical biology that should be further promoted?
I think that biologists in academia and industry are generating far more data than can be properly analyzed. The gap between the flood of data and dearth of statistical methodology extends from exploratory data analysis and visualization to high-dimensional inference methods for the joint analysis of multiple and diverse datasets with complex dependence structures among variables (e.g., joint analysis of mRNA-Seq data and Gene Ontology metadata). There is a need for better standards for validating and benchmarking novel statistical methods, e.g., based on negative and positive biological controls and properly designed in silico experiments. High-quality statistical software and reproducible research are also key to progress in computational biology.
Take home message: there is still much work to be done. Good for us, it seems we’ll be much needed for a while!
Leave a Reply
You must be logged in to post a comment.