April 17, 2018
1,011. That’s the number of different Saccharomyces cerevisiae yeast strains that were whole-genome sequenced and phenotyped by a team of researchers jointly led by Joseph Schacherer and Gianni Liti, published this week in Nature (Peter et al., 2018; data at: http://bit.ly/1011genomes-DataAtSGD).
Scrupulously gathering isolates of S. cerevisiae from as many diverse geographical locations and ecological niches as possible, the authors and their collaborators plucked yeast cells not only from the familiar wine, beer and bread sources, but also from rotting bananas, sea water, human blood, sewage, termite mounds, and more. The authors then surveyed the evolutionary relationships among the strains to describe the worldwide population distribution of this species and deduce its historical spread.
They found that the greatest amount of genome sequence diversity existed among the S. cerevisiae strains collected from Taiwan, mainland China, and other regions of East Asia. This means that in all likelihood the geographic origin of S. cerevisiae lies somewhere in East Asia. According to the authors, our budding yeast friend began spreading around the globe about 15,000 years ago, undergoing several independent domestication events during its worldwide journey. For example, it turns out that wine yeast and sake yeast were domesticated from different ancestors, thousands of years apart from each other. Whereas genomic markers of domestication appeared about 4,000 years ago in sake yeast, such markers appeared in wine yeast only 1,500 years ago.
Additionally — and similar to the situation where human interspecific hybridization with Neanderthals occurred only after humans migrated out of Africa — it appears that S. cerevisiae has inter-bred very frequently with other Saccharomyces species, especially S. paradoxus, but that most of these interspecific hybridization events occurred after the out-of-China dispersal.
There are many more gems to be found among the treasure trove of information in this paper. Some notable conclusions from the authors include: diploids are the most fit ploidy; copy number variation (CNV) is the most prevalent type of variation; most single nucleotide polymorphisms (SNPs) are very rare alleles in the population; extensive loss of heterozygosity is observed among many strains. There are also phenotype results (fitness values) for 971 strains across 36 different growth conditions.
As is often the case for yeast, the ability to sequence and analyze whole genomes at very deep coverage has yielded broad insights on eukaryotic genome evolution. The team’s work highlights this by presenting a comprehensive view of genome evolution on many different levels (e.g., differences in ploidy, aneuploidy, genetic variants, hybridization, and introgressions) that is difficult to obtain at the same scale and accuracy for other eukaryotic organisms.
SGD is happy to announce that in conjunction with the authors and publishers, we are hosting the datasets from the paper at this SGD download site. These datasets include: the actual genome sequences of the 1,011 isolates; the list of 4,940 common “core” ORFs plus 2,856 ORFs that are variable within the population (together these make up the “pangenome”); copy number variation (CNV) data; phenotyping data for 36 conditions; SNPs and indels relative to the S288C genome; and much more. We hope that the easy availability of these large datasets will be useful to many yeast (and non-yeast) researchers, and as the authors say, will help to “guide future population genomics and genotype–phenotype studies in this classic model system.”
September 08, 2016
Ever wonder how quickly your favorite protein turns over within the cell? SGD has just incorporated half-life data for 3700 yeast proteins from a paper by Christiano et al., 2014. In this study, Christiano and colleagues pulse labeled exponentially growing wild type yeast cells in synthetic medium with a heavy lysine isotope (pulse SILAC), and followed the decay of native untagged proteins using high-resolution mass spectrometry based proteomics. The data generated in this study can be accessed by viewing the Experimental Data section of the Protein tab for your favorite gene, such as the short-lived Ctk1p or the long-lived Rsc1p.
In addition, you can retrieve this half-life data using YeastMine for one or more proteins with the Gene–>Protein Half-life template or obtain a list of proteins with half lives within a given range using the Retrieve–>Proteins with half-life in a given range template. Both of these templates can be found in the “Templates” section of YeastMine under the “Protein” category.
Thanks to Romaine Christiano and Tobias Walther for their help integrating this information into SGD.
Categories: New Data
June 06, 2016
We’ve added 1,400 high-throughput (HTP) cellular component GO annotations from a new paper published by Maya Schuldiner’s lab. In this paper, Yofe et al., 2016 devised and implemented a methodology, called SWAT (short for SWAp-Tag), creating a parental library containing 1,800 strains, all known or predicted to localize to the yeast endomembrane system. Once created, this novel acceptor library serves as a template that can be ’swapped’ into other libraries, thus facilitating the rapid interconversion to new libraries by simply replacing the acceptor module with a new tag or sequence of choice. As proof of principle, this paper describes the parental library (N’ SWAT-GFP), and its utility as a gateway to the construction of two additional libraries (N’ mCherry and N’ seamless GFP). A high-content screening platform was used to generate images that were then manually reviewed and used to assign subcellular locations for proteins in these collections. Based on these results, SGD has incorporated GO annotations for proteins when at least two of three tags gave the same cellular localization. In addition, Locus Summary page descriptions for genes within this collection that did not have a known cellular location prior to this study have been updated. Finally, this study also provides access to a list of proteins predicted to contain signal peptides using three different algorithms. We would like to thank Maya Schuldiner and members of her lab for help with the integration of this information into SGD.
Categories: New Data
June 30, 2015
Yeast and humans diverged about a billion years ago, but there’s still enough functional conservation between some pairs of yeast and human genes that they can be substituted for each other. How cool is that?! Which genes are they? What do they do?
This two-minute video explains how to find, search, and download the yeast-human functional complementation data in SGD. You can find help with many other aspects of SGD in the tutorial videos on our YouTube channel. And as always, please be sure to contact us with any questions or suggestions.
June 10, 2015
Yeast and humans diverged about a billion years ago. So if there’s still enough functional conservation between a pair of similar yeast and human genes that they can be substituted for each other, we know they must be critically important for life. An added bonus is that if a human protein works in yeast, all of the awesome power of yeast genetics and molecular biology can be used to study it.
To make it easier for researchers to identify these “swappable” yeast and human genes, we’ve started collecting functional complementation data in SGD. The data are all curated from the published literature, via two sources. One set of papers was curated at SGD, including the recent systematic study of functional complementation by Kachroo and colleagues. Another set was curated by Princeton Protein Orthology Database (P-POD) staff and is incorporated into SGD with their generous permission.
As a starting point, we’ve collected a relatively simple set of data: the yeast and human genes involved in a functional complementation relationship, with their respective identifiers; the direction of complementation (human gene complements yeast mutation, or vice versa); the source of curation (SGD or P-POD); the PubMed ID of the reference; and an optional free-text note adding more details. In the future we’ll incorporate more information, such as the disease involvement of the human protein and the sequence differences found in disease-associated alleles that fail to complement the yeast mutation.
You can access these data in two ways: using two new templates in YeastMine, our data warehouse; or via our Download page. Please take a look, let us know what you think, and point us to any published data that’s missing. We always appreciate your feedback!
YeastMine is a versatile tool that lets you customize searches and create and manipulate lists of search results. To help you get started with YeastMine we’ve created a series of short video tutorials explaining its features.
This template lets you query with a yeast gene or list of genes (either your own custom list, or a pre-made gene list) and retrieve the human gene(s) involved in cross-species complementation along with all of the data listed above.
This template takes either human gene names (HGNC-approved symbols) or Entrez Gene IDs for human genes and returns the yeast gene(s) involved in cross-species complementation, along with the data listed above. You can run the query using a single human gene as input, or create a custom list of human genes in YeastMine for the query. We’ve created two new pre-made lists of human genes that can also be used with this template. The list “Human genes complementing or complemented by yeast genes” includes only human genes that are currently included in the functional complementation data, while the list “Human genes with yeast homologs” includes all human genes that have a yeast homolog as predicted by any of several methods.
If you’d prefer to have all the data in one file, simply visit our Curated Data download page and download the file “functional_complementation.tab”.
February 23, 2015
SGD curators periodically update the chromosomal annotations of the S. cerevisiae Reference Genome, which is derived from strain S288C. Last November, the genome annotation was updated for the first time since the release of the major S288C resequencing update in February 2011. Note that the underlying sequence of 16 assembled nuclear chromosomes, plus the mitochondrial genome, remained unchanged in annotation release R64.2.1 (relative to genome sequence release R64.1.1).
The R64.2.1 annotation release included various updates and additions. The annotations of 2 existing proteins changed (GRX3/YDR098C and HOP2/YGL033W), and 1 new ORF (RDT1/YCL054W-A) and 4 RNAs (RME2, RME3, IRT1, ZOD1) were added to the genome annotation. Other additions include 8 nuclear matrix attachment sites, and 8 mitochondrial origins of replication. The coordinates of many autonomously replicating sequences (ARS) were updated, and many new ARS consensus sequences were added. Complete details can be found in the Summary of Chromosome Sequence and Annotation Updates.
December 17, 2014
Have you ever wondered what’s happening to your favorite protein as it’s hanging out in the cell? SGD’s advanced search tool, YeastMine, now includes four new templates that can be used to find protein modification and abundance data.
The Gene -> Protein Modifications template retrieves phosphorylation, ubiquitination, succinylation, acetylation and methylation data, currently curated from the following 11 publications: Peng et al. 2003, Hitchcock et al. 2003, Seyfried et al. 2008, Vogtle et al. 2009, Ziv et al. 2011, Mommen et al. 2012, Henriksen et al. 2012, Swaney et al. 2013, Kolawa et al. 2013, Weinert et al. 2013, and Wang et al. 2014.
The Gene -> Experimental N-termini and N-terminal modifications template retrieves experimentally-determined amino-terminal sequence and acetylation data, currently curated from Vogtle et al. 2009 and Mommen et al. 2012.
Lastly, two new templates pull protein abundance data curated from Ghaemmaghami et al. 2003. Gene -> Protein Abundance retrieves molecules/cell counts for a gene or list of genes. The same data can be quickly filtered using the Retrieve -> Proteins in a given molecules/cell abundance range template.
Please explore these new YeastMine protein data templates, and send us your feedback.
December 08, 2014
At SGD, we are expanding our scope to provide annotation and comparative analyses of all major budding yeast strains, and are making progress in our move toward providing multiple reference genomes. To this end, the following new S. cerevisiae genomes have been incorporated into SGD as “Alternative References”: CEN.PK, D273-10B, FL100, JK9-3d, RM11-1a, SEY6210, SK1, Sigma1278b, W303, X2180-1A, Y55. These genomes are accessible via Sequence, Strain, and Contig pages, and are the genomes for which we have curated the most phenotype data, and for which we aim to curate specific functional information. It is important to emphasize that we are not abandoning a standard sequence; S288C is still in place as “The Reference Genome”. However, we do recognize that it is helpful for students and researchers to be able to ‘shift the reference’, selecting the genome that is most appropriate and informative for a specific area of study.
These new genome sequences have been also been added to SGD’s BLAST datasets, multiple sequence alignments, the Pattern Matching tool, and the Downloads site. Please explore these new genomes, and send us your feedback.
October 13, 2014
We are pleased to announce that the redesign of our gene-specific pages, which has been ongoing over the past year, is now complete with the release of the reworked Locus Summary page. The page contains all of the information on the previous Locus Summary page, and has a more modern look and feel. Note that the order and organization of the sections has changed, and the order of the tabs across the top of the page has changed as well. New elements on the page include a navigation bar on the left to take you to the different sections of the page, a redesigned map showing genomic context in the sequence section, and a new interactive histogram summarizing expression data. Biochemical pathway information now appears in its own section (see an example), and we have added a History section to replace the previous Locus History tab. If there are no data of a particular type (for example, Pathways), then that section is absent from the page.
Please explore this new page and send us your feedback.
October 06, 2014
The Expression pages have been redesigned and now include a clickable histogram depicting conditions and datasets in which the gene of interest is up- or down-regulated. Expression data are derived from records contained in the Gene Expression Omnibus, and datasets are assigned one or more categories to facilitate grouping, filtering and browsing. Short descriptions of the focus of each experiment are also provided. The PCL files generated for each dataset are used to populate the expression analysis tool SPELL. Also included on the pages are network diagrams which display genes that share expression profiles. The Expression pages provide seamless access to the SPELL tool at SGD, as well as external resources such as Cyclebase, GermOnline, YMGV and FuncBase.
Please explore these new pages, accessible via the Expression tab on your favorite Locus Summary page, and send us your feedback.
September 15, 2014
Have you ever wondered about the role played by the homolog of a particular yeast gene in other fungal species? SGD’s advanced search tool, YeastMine, can now be used to find homologs of your favorite Saccharomyces cerevisiae genes in the pathogenic yeast, Candida glabrata. There are now 25 species of pathogenic and non-pathogenic fungi in YeastMine, including S. cerevisiae.
The fungal homologs of a given S. cerevisiae gene can be found using the template called “Gene –> Fungal Homologs.” Fungal homology data comes from various sources including FungiDB, the Candida Gene Order Browser (CGOB), the Yeast Gene Order Browser (YGOB), the Candida Genome Database (CGD), the Aspergillus Genome Database (AspGD) and PomBase, and the results link directly to the corresponding homolog gene pages in the relevant databases.
A results table is generated after each query and the identifiers and standard names for the fungal homologs are listed in the table. As with other YeastMine templates, results can be saved as lists for further analysis. You can also create a list of yeast gene names and/or identifiers using the updated Create Lists feature that allows you to specify the organism representing the genes in your list. The query for homologs can then be made against the custom gene list.
All of the new templates that query fungal homolog data can be found on the YeastMine Home page under the “Homology” tab. This template complements the template “Gene → Non-Fungal and S. cerevisiae Homologs” that retrieves homologs of S. cerevisiae genes in humans, rats, mice, worms, flies, mosquitos, and zebrafish.
August 25, 2014
New Sequence pages are now available in SGD for virtually every yeast gene (e.g., HMRA1 Sequence page), and include genomic sequence annotations for the Reference Strain S288C, as well as several Alternative Reference Genomes from strains such as CEN.PK, RM11-1a, Sigma1278b, and W303 (more Alternative References coming soon). Each page includes an Overview section containing descriptive information, maps depicting genomic context in Reference Strain S288C (as shown below) and Alternative Reference strains, as well as chromosomal and relative coordinates in S288C.
The sequence itself includes display options for genomic DNA, coding DNA, or translated protein.
Also available on each Sequence page are links to redesigned S288C Chromosome pages, links to new Contig pages for Alternative Reference Genomes, and a Downloads menu for easy access to DNA sequences of several other industrial strains and environmental isolates. The new Sequence, Chromosome, and Contig pages make use of many of the features you enjoy on other new or redesigned pages at SGD, including graphical display of data, sortable tables, and responsive visualizations. The Sequence pages also provide seamless access to other tools at SGD such as BLAST and Web Primer. Please explore these new pages, accessible via the Sequence tab on your favorite Locus Summary page, and send us your feedback.
June 24, 2014
We have redesigned the Protein page to include a new tabular display of protein domains. This table provides the identifier for each domain and illustrates the respective locations of the domains within the protein. In addition to this new table, the domains are displayed in an interactive network diagram that presents the proteins that share these domains with your protein of interest (see figure below, left).
Another new feature on the Protein page is the display of phosphorylation sites within the protein’s sequence (as curated by BioGRID). This feature is available for both the reference strain S288C and other commonly used S. cerevisae strains, using the pull-down to select the desired strain view (see figure below, right) .
March 26, 2014
What happens when you cross two comprehensive deletion mutant collections with a library of more than 1800 structurally diverse chemicals? HIP HOP happens. Not the music, but a whole lot of very informative phenotype data – over 40 million data points!
The response of S. cerevisiae mutant strains to a chemical can tell us a lot about which pathways or processes the chemical affects. This is not only interesting for yeast biologists, but also has important implications for human molecular biology and disease research. So a group at The Novartis Institutes of Biomedical Research decided to test the sensitivity of nearly 6,000 mutant yeast strains to a panel of about 1,800 compounds.
Hoepfner and colleagues have published these results and have also generously offered them to SGD. They used the HIP and HOP methods (HIP, HaploInsufficiency Profiling, using diploid heterozygous deletion mutant strains; HOP, HOmozygous deletion Profiling, using diploid homozygous deletion mutant strains) that have proven very useful in yeast since the creation of the systematic deletion mutant collections.
To do this mammoth series of experiments they obviously needed to set up an automated pipeline. These sorts of experiments have been done before, but in this study Hoepfner et al. improved on existing procedures in many ways: the physical techniques, the controls and replicates included, and the methods for data analysis.
Phenotype annotations in SGD. We’ve incorporated a subset of these results into SGD as mutant phenotype annotations. Why a subset? Some of the chemicals that were used in these experiments are un-named proprietary compounds, so the individual phenotypes would not be very informative in the context of SGD. We’ve added the phenotypes that involve named chemicals to SGD – more than 5,500 annotations. These may be viewed on Phenotype Details pages for individual genes (see example), retrieved as a set using Yeastmine, or downloaded along with all SGD mutant phenotype annotations in our phenotype data download file.
Easy access to the full dataset and analyses. We’ve also added a new set of links to SGD that take you directly from your favorite gene to the authors’ website, which provides full access to all of the data and interesting ways to look at it (see below). When you click on a “HIP HOP Profile” link from the Locus Summary page or the Phenotype Details page of a gene in SGD, the landing page at the authors’ website allows you to explore data for mutants in that gene or for chemicals affecting that mutant strain. You can see which chemicals had the greatest effects, which other mutant strains have a similar range of phenotypes, and much more. And if a chemical that has interesting effects is proprietary, don’t worry; Hoepfner and colleagues have stated that they “encourage future academic collaborations around individual compounds used in this study.”
Information about mutant strains. In the course of this study, the authors also generated some very useful data about particular mutant strains in the deletion collection. Some of them were hypersensitive to more than 100 different chemicals. Others turned out to be carrying additional background mutations that could affect the phenotypes of the mutant strain. We are planning to display this kind of information (from this and other studies) directly on SGD Phenotype Details pages in the future.
We thank Dominic Hoepfner and colleagues for sharing these data with SGD and for helping us to incorporate the data. And we encourage you to explore this new resource and contact us with any questions or suggestions.
Categories: New Data
March 13, 2014
Towards the goal of compiling datasets to produce a complete transcriptome of yeast (the set of all RNA molecules produced in a single cell or population of cells), we have loaded a defined set of transcripts, based primarily on data from Pelechano, et al, but supported by other datasets, into SGD’s flexible search tool, YeastMine. The representative set includes transcripts which Pelechano et al. identified by simultaneous determination of the 5’ and 3’ ends of mRNA molecules whose end coordinates are supported by datasets from other laboratories.
The transcript data can be accessed in YeastMine using the ‘Gene -> Transcripts’ template, which allows you to specify a gene name or list of gene names and return the list of all associated transcripts based on the collection of data described above. The results include the start and end coordinates for each transcript, the number of counts observed for each transcript in glucose and galactose, notes, and references for the relevant datasets.
Categories: New Data
March 04, 2014
You can now use SGD’s advanced search tool, YeastMine, to find the human homolog(s) of your favorite yeast gene and their corresponding disease associations. Or, begin with your favorite human gene or disease keyword and retrieve the yeast counterparts of the relevant gene(s). As an example, you can search for the S. cerevisiae homologs of all human genes associated with disorders that contain the keyword “diabetes” (view search).
We have recently loaded data from OMIM (Online Mendelian Inheritance in Man) into our fast, flexible search resource, YeastMine, and provided 3 predefined queries (templates) that make it simple to perform the above searches. Newly updated HomoloGene, Ensembl, TreeFam, and Panther data sets are used to define the homology between S. cerevisiae and human genes. The results table provides identifiers and standard names for the yeast and human genes, as well as OMIM gene and disease identifiers and names. As with other YeastMine templates, results can be saved as lists and analyzed further. You can also now create a list of human names and/or identifiers using the updated Create Lists feature that allows you to specify the organism representing the genes in your list. The query for yeast homologs can then be made against this list.
In addition to human disease homologs, we have incorporated fungal homolog data for 24 additional species of fungi. You can now query for the fungal homologs of a given S. cerevisiae gene using the template “Gene –> Fungal Homologs.” This fungal homology data comes from various sources including FungiDB, the Candida Gene Order Browser (CGOB), and PomBase, and the results link directly to the corresponding gene pages in the relevant databases, including Candida Genome Database (CGD) and Aspergillus Genome Database (AspGD).
All of the new templates that query human and fungal homolog data can be found on the YeastMine Home page under the new tab “Homology.” These templates complement the template “Gene → Non-Fungal and S. cerevisiae Homologs” that retrieves homologs of S. cerevisiae genes in human, rat, mouse, worm, fly, mosquito, and zebrafish.
Watch the Human Disease & Fungal Homologs in SGD’s YeastMine tutorial (below) to learn how to find and use these new templates.
February 21, 2014
Did you know you can find and contribute teaching and other educational resources to SGD? We have updated our Educational Resources page, found on the SGD Community Wiki. There are links to teaching resources such as classroom materials, courses, and fun sites, as well as pointers to books, dedicated learning sites, and tutorials that can help you learn more about basic genetics. Many thanks to Dr. Erin Strome and Dr. Bethany Bowling of Northern Kentucky University for being the first to contribute to this updated site by providing a series of Bioinformatics Project Modules designed to introduce undergraduates to using SGD and other bioinformatics resources.
We would like to encourage others to contribute additional teaching or general educational resources to this page. To do so, just request a wiki account by contacting us at the SGD Help desk – you will then be able to edit the SGD Community Wiki. If you prefer, we would also be happy to assist you directly with these edits.
Note that there are many other types of information you can add to the SGD Community Wiki, including information about your favorite genes, protocols, upcoming meetings, and job postings. The Community Wiki can be accessed from most SGD pages by clicking on “Community” on the main menu bar and selecting “Wiki.” The Educational Resources page is linked from the left menu bar under “Resources” from all the SGD Community Wiki pages. For more information on this newly updated page, please view the video below, “Educational Resources on the SGD Community Wiki.”
February 12, 2014
Annotation Extension data for select GO annotations are now available at SGD. The Annotation Extension field (also referred to as column 16 after its position in the gene_association file of GO annotations) was introduced by the Gene Ontology Consortium (GOC) to capture details such as substrates of a protein kinase, targets of regulators, or spatial/temporal aspects of processes. The information in this field serves to provide more biological context to the GO annotation. At SGD, these data are accessible for select GO annotations via the small blue ‘i’ icon on the newly redesigned GO Details pages. See, for example, the substrate information for MEK1 kinase (image below). Currently, a limited number of GO annotations contain data in this field because we have only recently begun to capture this information; more will be added in the future.
We have also redesigned the GO Details and Phenotype Details tab pages to make it easier to understand and make connections within the data. In addition to all of the annotations that were previously displayed, these pages now include graphical summaries, interactive network diagrams displaying relationships between genes and tables that can be sorted, filtered, or downloaded. In addition, SGD Paper pages, each focusing on a particular reference that has been curated in SGD, now show all of the various types of data that are derived from that paper in addition to the list of genes covered in the paper (example). These pages provide seamless access to other tools at SGD such as GO Term Finder, GO Slim Mapper, and YeastMine. Please explore all of these new features from your favorite Locus Summary page and send us your feedback.
November 26, 2013
Transcriptional regulation data are now available on new “Regulation” tab pages for virtually every yeast gene. We are collaborating with the YEASTRACT database to display regulation annotations curated both by SGD and by YEASTRACT on these new pages. Regulation annotations are each derived from a published reference, and include a transcriptional regulator, a target gene, the experimental method used to determine the regulatory relationship, and additional data such as the strain background or experimental conditions. The relationships between regulators and the target gene are also depicted in an interactive Network Visualization diagram. The Regulation tab for DNA-binding transcription factors (TFs) includes these items and additionally contains a Regulation Summary paragraph summarizing the regulatory role of that TF, a table listing its protein domains and motifs, DNA binding site information, a table of its regulatory target genes, and an enrichment of the GO Process terms to which its target genes are annotated (view an example). In the coming months we will be adding this extra information to the Regulation pages of other classes of TFs, such as those that act by binding other TFs.
We have also completely redesigned the web display of the Interactions and Literature tab pages, which now include graphical display of data, sortable tables, interactive visualizations, and more navigation options. These pages provide seamless access to other tools at SGD such as GO tools and YeastMine. Please feel free to explore all of these new features from your favorite Locus Summary page and send us your feedback.
August 27, 2013
SGD has compiled a selection of seminal yeast literature, comprising landmark papers in yeast biology. The list is available on the SGD Wiki and includes important publications on cell biology, early genetic maps and genome surveys, and the original S288C sequencing consortium. Also listed are key papers describing the genomes of other sequenced strains of S. cerevisiae.
This new page is just one of the many resources already available on the SGD Wiki, such as What are Yeast?, Protocols, and Job listings. We encourage you to add additional information to any of the SGD Wiki pages. If you don’t already have an SGD Wiki account, please contact the SGD Help Desk to request one.
Categories: New Data