A Bayesian networks approach to predict protein complexes from
genomic data.
Ronald Jansen (1), Haiyuan Yu (2), Dov Greenbaum (2), Yuval Kluger (2),
Nevan Krogan (3), Sambath Chung (2), Michael Snyder (2), Andrew Emili (3), Jack
Greenblatt (3), Mark Gerstein (2)
(1) Computational Biology Center, Sloan-Kettering Institute, 307 East
63rd Street, New York, NY 10021, USA (jansenr@mskcc.org);
(2) Department of Molecular Biophysics & Biochemistry, 266 Whitney
Avenue, Yale University, PO Box 208114, New Haven, CT 06520, USA;
(3) Banting and Best Department of Medical Research, Department of
Molecular and Medical Research, University of Toronto, Toronto, M5G 1L6,
Ontario, Canada
There is a now a large amount of genomic protein-protein interaction
data for yeast, but the datasets are often incomplete and contradictory.
While the need to integrate them to get a comprehensive picture of the
interactome is obvious, actually carrying this out is non-trivial and,
thus far, no practical solution to this mathematical challenge has been
presented. We propose a Bayesian approach wherein different datasets can
be combined in a standardized way and, in contrast to simple
combinations of multiple datasets, weighted probabilistically (creating
probabilistic interactomes). Our approach is based on extrapolation from
small sets of validated interactions in complexes (positives) and non-interacting proteins (negatives). This not only allows an optimal
integration of experimental interaction data, but also the de novo
prediction of complexes from genomic information that is not interaction
data per se, such as expression, function and phenotype. The resulting
de novo predictions are similar in format to the results of a pull-down
experiment, thus we call our procedure 'virtual pull-down'. We find that
the virtual pull-down of complexes is about as accurate as the
combination of most of the existing experimental interaction data, while
achieving higher coverage. We were successful in verifying several
predictions with TAP-tagging experiments (including protein interactions
involving Nsr1, the nucleosome and complexes related to the eukaryotic
replication fork).