2006 Yeast Genetics and Molecular Biology Meeting
Princeton University
Princeton, New Jersey USA
July 25 - 30, 2006


Abstract #2

Goal-directed evidence integration for predicting biological networks in yeast. Chad Myers, Olga Troyanskaya. Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ.
   Broad availability of functional genomic data for S. cerevisiae has enabled new computational approaches to global analysis and prediction of biological networks. However, a remaining challenge in predicting networks from experimental data is harnessing the information present in diverse types of evidence while also accommodating noise and variability in the data in a robust manner. In generating experimentally testable models of networks from noisy data, it is essential to consider process-dependent variation in relevance and reliability. Our comprehensive study of public genomic data for yeast reveals that the relevance and quality characteristics of most data vary quite dramatically across biological processes, suggesting opportunity for improvement of most current data integration and prediction methods. We demonstrate how this inherent, process-dependent variation can be detected and exploited to achieve more accurate prediction of biological networks. More generally, we suggest a new paradigm for genomic integration and prediction methods in which every stage of the process is motivated by a specific biological context. Bioinformatics approaches are typically used by biologists with a specific focus in mind. This focus as well as the user’s expert domain knowledge can be used to direct and improve the task of extracting the relevant information from noisy data. We illustrate that the more specific we can be in defining the goal for machine learning approaches, the more we stand to benefit from them. We have implemented our goal-driven integration framework and query interface in a public, web-accessible system, expressly designed to allow frequent data updates as new experimental yeast data is published. Currently, our input evidence includes data from more than 6500 publications including physical and genetic interactions, gene expression, localization, and coding and regulatory sequence data. We have used our system to study the processes of DNA replication and chromosome segregation and generate specific, testable hypotheses, which we have confirmed experimentally.


Return to YGM 2006 Home at SGD