CAMBAM seminar speaker: Dr. Sandrine Dudoit

In our last seminar talk, we had the chance to host Sandrine Dudoit, from the University of California, Berkeley.  As a biologist turned bioinformatician and working with RNA-Seq data myself, I was very excited to attend her talk. And I was not disappointed.

Sandrine Dudoit engaged in discussion with the local biostatisticians. Photo: Juli Atherton

High-throughput sequencing technologies are rapidly taking over the field of genomics, thanks to the improvements in quality and cost of sequencing experiments. And, of course, the fact that they are relatively easy to understand by the general public facilitates funding. However, such studies may turn out to be more complex than anticipated…

In her talk, Dr. Dudoit outlined very nicely some of the computational and statistical challenges of these technologies. She covered issues from the definition of a biological question and experimental design to the actual data analysis and validation. Focusing on experimental design and higher level data analysis, Dr. Dudoit presented a thorough study of the problems than can beset this technology. Her results show how much more work is needed in several aspects, including controls, normalization and accounting for the complexity and heterogeneity of biological data.

Dr. Dudoit and Dr. Glass conversing - Bioinformatics meets Mathematical Biology. Photo: Juli Atherton

After the talk, I bothered her with a short interview by e-mail. When asked about the challenges she faces as a statistician speaking to the biological community, she answered:

In spite of the increasing volume of research at the interface between the biological and mathematical sciences, I believe there are still significant communication hurdles between the two fields. Rarely does a single individual possess sufficient depth in biology, statistics, and computing to meet the challenging questions routinely encountered in computational biology. Such questions are best addressed by interdisciplinary teams of experts in each of the relevant fields. Effective communication between biological and mathematical scientists is therefore key to the success of this enterprise. A common language and mutual understanding between biological and mathematical scientists can be achieved only if mathematical scientists acquire proper minimal training in biology and vice versa.

I was also curious to hear her opinion about the enormous amount of biological data that is currently being produced. Is it followed by good development of analytical tools? Which are the areas of mathematical biology that should be further promoted?

I think that biologists in academia and industry are generating far more data than can be properly analyzed. The gap between the flood of data and dearth of statistical methodology extends from exploratory data analysis and visualization to high-dimensional inference methods for the joint analysis of multiple and diverse datasets with complex dependence structures among variables (e.g., joint analysis of mRNA-Seq data and Gene Ontology metadata). There is a need for better standards for validating and benchmarking novel statistical methods, e.g., based on negative and positive biological controls and properly designed in silico experiments.  High-quality statistical software and reproducible research are also key to progress in computational biology.

Take home message: there is still much work to be done. Good for us, it seems we’ll be much needed for a while!

Leave a Reply

Blog authors are solely responsible for the content of the blogs listed in the directory. Neither the content of these blogs, nor the links to other web sites, are screened, approved, reviewed or endorsed by McGill University. The text and other material on these blogs are the opinion of the specific author and are not statements of advice, opinion, or information of McGill.