Askenazi M and Linial M (2012) Implicit biology in peptide spectral libraries. Anal Chem 84(18):7919-25
Abstract: Mass spectral libraries are collections of mass spectra curated specifically to facilitate the identification of small molecules, metabolites, and short peptides. One of the most comprehensive peptide spectral libraries is curated by NIST and contains upward of half a million annotated spectra dominated by human and model organisms including budding yeast and mouse. While motivated primarily by the technological goal of increasing sensitivity and specificity in spectral identification, we have found that the NIST spectral library constitutes a surprisingly rich source of biological knowledge. In this Article, we show that data-mining of these published libraries while applying strict empirical thresholds yields many characteristics of protein biology. In particular, we demonstrate that the size and increasingly comprehensive nature of these libraries, generated from whole-proteome digests, enables inference from the presence but crucially also from the absence of spectra for individual peptides. We illustrate implicit biological trends that lead to significant absence of spectra accounted for by complex post-translational modifications and overlooked proteolytic sites. We conclude that many subtle biological signatures such as genetic variants, regulated proteolysis, and post-translational modifications are exposed through the systematic mining of spectral collections originally compiled as general-purpose, technology-oriented resources.
|Status: Published||Type: Journal Article||PubMed ID: 22909014|
Topics addressed in this paper
- To find other papers on a gene and topic, click on the colored ball in the appropriate box.
- displays other papers with information about that topic for that gene.
- displays other papers in SGD that are associated with that topic.
The topic is addressed in these papers but does not describe a specific gene or chromosomal feature.
- To go to the Locus page for a gene, click on the gene name.