New & Noteworthy

2D RNA structures from RNAcentral

October 20, 2022

SGD has updated our RNA pages to add secondary structures provided by RNAcentral and generated by R2DT. Thumbnails and linkouts to RNAcentral via RNAcentral IDs are shown on the Summary and Sequence pages.

Interactive secondary structure viewers are available on the Sequence pages.

Take the pages for a spin! For more information about the structures, please see the Help page at RNAcentral.

Categories: New Data, Website changes

Tags: RNA structure

SGD Homology Data Now Available On New Homology Pages

March 25, 2021

SGD is excited to introduce our new Homology Pages! These pages can be accessed by clicking on the Homology tab in the header of SGD gene pages, as seen below.

The information displayed on the Homology Pages is divided into several sections:

Homologs: Information about known homologs for the gene of interest, such as the species of the homolog, the corresponding Gene ID from the Alliance of Genome Resources, and the name of the homolog.
Functional Complementation: Data about cross-species functional complementation between yeast and other species, curated by SGD and the Princeton Protein Orthology Database (P-POD).
Fungal Homologs: Curated homolog information for 24 additional species of fungi. View the species of the fungal homolog, the database source of the entry, and the Gene ID of the homolog from that database.
External Identifiers: A list of external identifiers for the protein from various database sources.

If you have any questions or feedback regarding our new Homology Pages, please do not hesitate to contact us at any time.

Categories: Data updates, Homologs, New Data, Yeast and Human Disease

Sequence Variant Tracks Added to JBrowse

June 12, 2019

We are excited to announce that 50 new “Variants” data tracks are now available for use in our genome browsing tool JBrowse. Utilizing whole-genome sequencing data published by Song et al. (2015), these data tracks visualize how the sequences of 25 S. cerevisiae strains differ from that of the reference genome strain, S288C.

Two data tracks are available for each of the 25 strains: a track that indicates single nucleotide polymorphisms (SNPs) relative to strain S288C, and a track that shows insertions or deletions (“indels”) relative to S288C.

Accessing these new data tracks is easy—just enter JBrowse and click on the “Select tracks” tab on the upper-left hand part of the page. Then, select the “variants” category. You can also download the variants, annotation, and sequence files on these strains for use in your own analyses.

If you’re new to JBrowse, don’t miss out—getting started takes no time at all. For information on how to use this tool, be sure to check out the JBrowse playlist on the SGD YouTube Channel or visit the JBrowse help page. If you have any questions or feedback about the new “Variants” data tracks or about our genome browsing tool, please don’t hesitate to contact us.

Table of strains with “Variants” data tracks in JBrowse, along with links to download their respective dataset:

Strain (link to Variants tracks in JBrowse)	File Download Link
BC187	BC187_Stanford_2014_JRII00000000.zip
BY4741	BY4741_Stanford_2014_JRIS00000000.zip
BY4742	BY4742_Stanford_2014_JRIR00000000.zip
CEN_PK2-1Ca	CEN.PK2-1Ca_Stanford_2014_JRIV01000000.zip
D273-10B	D273-10B_Stanford_2014_JRIY00000000.zip
DBVPG6044	DBVPG6044_Stanford_2014_JRIG00000000.zip
FL100	FL100_Stanford_2014_JRIT00000000.zip
FY1679	FY1679_Stanford_2014_JRIN00000000.zip
JK9-3d	JK9-3d_Stanford_2014_JRIZ00000000.zip
K11	K11_Stanford_2014_JRIJ00000000.zip
L1528	L1528_Stanford_2014_JRIK00000000.zip
RedStar	RedStar_Stanford_2014_JRIL00000000.zip
RM11-1A	RM11-1A_Stanford_2014_JRIP00000000.zip
SEY6210	SEY6210_Stanford_2014_JRIW00000000.zip
Sigma1278b-10560-6B	Sigma1278b-10560-6B_Stanford_2014_JRIQ00000000.zip
SK1	SK1_Stanford_2014_JRIH00000000.zip
UWOPS05_217_3	UWOPS05-217-3_Stanford_2014_JRIM00000000.zip
W303	W303_Stanford_2014_JRIU00000000.zip
X2180-1A	X2180-1A_Stanford_2014_JRIX00000000.zip
Y55	Y55_Stanford_2014_JRIF00000000.zip
YJM339	YJM339_Stanford_2014_JRIE00000000.zip
YPH499	YPH499_Stanford_2014_JRIO00000000.zip
YPS128	YPS128_Stanford_2014_JRID00000000.zip
YPS163	YPS163_Stanford_2014_JRIC00000000.zip
YS9	YS9_Stanford_2014_JRIB00000000.zip

Categories: New Data

Explore the S288C Transcriptome in JBrowse

April 25, 2019

We have recently equipped our genome browsing tool JBrowse with 9 new Transcriptome data tracks, making JBrowse an even more powerful way to explore the vast heterogeneity of the S288C transcriptome. These information-rich data tracks visualize RNA transcripts from the TIF-seq dataset published by Pelechano et al. (2013), enabling quick and easy viewing of the position, length, and abundance of transcript isoforms sequenced in the study.

You can easily access these new tracks by entering JBrowse and clicking on the left-hand “Select tracks” tab. They are located in the Transcriptome category. In addition to viewing the data in JBrowse, you can also download the .gff3 and .bw files for these tracks for use in your own analyses.

Check out our video tutorial from the SGD YouTube channel at the top of this page for a quick overview of the new transcriptome data tracks and how to access them. More information about these tracks and how SGD created them can also be found on our Genome Browser help page.

If you have any questions or feedback about the new Transcriptome data tracks or about our genome browser, please don’t hesitate to contact us.

Data tracks that visualize transcript isoforms that fully overlap a gene coding region:

Data Track Title	Description
longest_full-ORF_transcripts_ypd	This track contains the longest transcript overlapping each individual ORF completely for WT cells grown in glucose (ypd) media.
longest_full-ORF_transcripts_gal	This track contains the longest transcript overlapping each individual ORF completely for WT cells grown in galactose (gal) media.
most_abundant_full-ORF_transcripts_ypd	This track contains the most abundant transcript overlapping each individual ORF completely for WT cells grown in glucose (ypd) media.
most_abundant_full-ORF_transcripts_gal	This track contains the most abundant transcript overlapping each individual ORF completely for WT cells grown in galactose (gal) media.
unfiltered_full-ORF_transcripts	This track contains all transcripts that overlapped individual open reading frame (ORF) completely for WT cells grown in either glucose (ypd) or galactose (gal) media.

Data tracks that quantify the number of transcripts that cover a given nucleotide in the S288c genome:

Data Track Title	Description
plus_strand_coverage_ypd	For WT cells grown in glucose media (ypd), the amount of transcripts covering each position on the plus strand is represented in this track.
plus_strand_coverage_gal	For WT cells grown in galactose media (gal), the amount of transcripts covering each position on the plus strand is represented in this track.
minus_strand_coverage_ypd	For WT cells grown in glucose media (ypd), the amount of transcripts covering each position on the minus strand is represented in this track.
minus_strand_coverage_gal	For WT cells grown in galactose media (gal), the amount of transcripts covering each position on the minus strand is represented in this track.

Categories: New Data, Tutorial

Proteome-wide abundance data

March 11, 2019

SGD has now incorporated proteome-wide protein abundance data obtained from a comprehensive meta-analysis by Ho et al., 2018. The authors normalized and combined 21 different S. cerevisiae protein abundance datasets—including data from both untreated cells and cells treated with various environmental stressors—to create a unified protein abundance dataset where all values are in the intuitive units of molecules per cell. The original datasets were initially obtained using different methodologies (mass spectrometry, fluorescence microscopy, flow cytometry, and TAP-immunoblot), allowing Ho et al. to evaluate the strengths and weaknesses of these methods in addition to providing the community with a comprehensive reference map of the yeast proteome.

Normalized abundance measurements and associated metadata from untreated and treated cells are displayed in tabular form in the experimental data section of protein-tabbed pages (e.g. CDC28). Several different controlled vocabularies have been employed to standardize the metadata display. In addition, calculated median abundance and median absolute deviation (MAD) values are displayed in the protein section of Locus Summary pages (e.g. PHO85). Two new YeastMine templates have been created to provide access to these data: Gene -> Protein Abundance and Gene -> Median Protein Abundance

Special thanks to Brandon Ho and Grant Brown for generating this comprehensive reference map of protein abundance, and for their help in making this data available to the larger community.

Categories: New Data

New Data Tracks added to JBrowse

January 15, 2019

SGD has updated our JBrowse genome browser with 157 new data tracks related to genome-wide experiments and omics data for you to explore. You can easily access these new tracks, which visualize data from the twenty publications listed below, by entering JBrowse and clicking on the left-hand “Select tracks” tab. Then, search for the PMID associated with the reference of interest.

Note that some references appear more than once, as they have multiple data tracks associated that belong to different categories in JBrowse.

For more information on using JBrowse, be sure to check out our playlist of JBrowse video tutorials on YouTube. If you have any questions or feedback about the new tracks or about our genome browser, please don’t hesitate to contact us.

Transcription & Transcriptional Regulation

Reference	PMID	Description in JBrowse
Baptista et al. (2017)	28918903	ChEC-seq to map the genome-wide binding of the SAGA coactivator complex in budding yeast.
Castelnuovo et al. (2014)	24497191	Genome-wide measurement of whole transcriptome versus histone modified mutants
El Hage et al. (2014)	25357144	Genome-wide distribution of RNA-DNA hybrids identifies RNase H targets in tRNA genes retrotransposons and mitochondria.
Freeberg et al. (2013)	23409723	Mapped regions of untranslated, polyadenylated transcriptome bound by RNA-binding proteins (RBPs)
Kang et al. (2015)	25213602	Genome-wide transcript profiling by paired-end ditag sequencing
Lee et al. (2018)	29339748	ChIP-Seq, mRNA-seq, ATAC-seq, and MNase-seq samples in wild-type (WT) and various mutants were prepared using Saccharomyces cerevisiae.
Park et al. (2014)	24413663	Simultaneous mapping of RNA ends by sequencing (SMORE-seq) to identify the strongest transcription start sites and polyadenylation sites genome-wide
Rossbach et al. (2017)	28924058	Authors utilized the Calling Cards Ty5 retrotransposon insertion method to identify binding sites of cdc7kd, cdc7kdΔcterm and Gal4 transcription factor within the yeast genome.
Schaughnency et al. (2014)	25299594	Genome-wide identification of transcription termination sites; pA pathway and non-polyadenylation pathway in strains missing Sen1p or Nrd1p

Histone Modification

Reference	PMID	Description in JBrowse
Castelnuovo et al. (2014)	24497191	Genome-wide measurement of whole transcriptome versus histone modified mutants
Hu J. et al. (2015)	26628362	ChIP-seq and MNase-seq to determine how histone modifications and chromatin structure directly regulate meiotic recombination. Identified acetylation of histone H4 at Lys44 (H4K44ac) as a new histone modification
Joo et al. (2017)	29203645	Next-Generation-Sequecing (NGS)-derived genome-wide occupancy of TAF (Taf1) compared with other basal initiation components (TBP and TFIIB), histones (H3, H4, Htz1 and H4 acetylation) and histone regulator complexes (Swr1, Bdf1) in S. cerevisiae
Kniewel et al. (2017)	28986445	ChIP-seq to determine the whole-genome enrichment of Mek1 targeted histone H3 threonine 11 phosphorylation (H3 T11ph) during Saccharomyces cerevisiae meiosis.
Lee et al. (2018)	29339748	ChIP-Seq, mRNA-seq, ATAC-seq, and MNase-seq samples in wild-type (WT) and various mutants were prepared using Saccharomyces cerevisiae.
Weiner et al. (2018)	25801168	Examining chromatin dynamics through genome-wide mapping of 26 histone modifications at 0 4 8 15 30 and 60 minutes after diamide addition using MNase-ChIP

Chromatin Organization

Reference	PMID	Description in JBrowse
Chereji et al. (2014)	29426353	Genome binding/occupancy profiling of single nucleosomes and linkers by high throughput sequencing
Gutierrez et al. (2017)	29212533	Authors sought to correct sequence bias of MNase-Seq with a method based on the digestion of naked DNA and the use of the bioinformatic tool DANPOS
Hu Z. et al. (2014)	24532716	Genome-wide measurement of nucleosome occupancy during cell aging
Hu J. et al. (2015)	26628362	ChIP-seq and MNase-seq to determine how histone modifications and chromatin structure directly regulate meiotic recombination. Identified acetylation of histone H4 at Lys44 (H4K44ac) as a new histone modification
Joo et al. (2017)	29203645	Next-Generation-Sequecing (NGS)-derived genome-wide occupancy of TAF (Taf1) compared with other basal initiation components (TBP and TFIIB), histones (H3, H4, Htz1 and H4 acetylation) and histone regulator complexes (Swr1, Bdf1) in S. cerevisiae
Lee et al. (2018)	29339748	ChIP-Seq, mRNA-seq, ATAC-seq, and MNase-seq samples in wild-type (WT) and various mutants were prepared using Saccharomyces cerevisiae.

RNA Catabolism

Reference	PMID	Description in JBrowse
Geisberg et al. (2014)	24529382	Half-lives of 21,248 mRNA 3_ isoforms in yeast were measured by rapidly depleting RNA polymerase II from the nucleus and performing direct RNA sequencing throughout the decay process.
Smith et al. (2014)	24931603	Identification of genome-wide transcripts; looking at nonsense-mediated RNA decay pathway

Transposons

Reference	PMID	Description in JBrowse
Lee et al. (2018)	29339748	ChIP-Seq, mRNA-seq, ATAC-seq, and MNase-seq samples in wild-type (WT) and various mutants were prepared using Saccharomyces cerevisiae.
Michel et al. (2017)	28481201	Genome-wide examination of protein function by using transposons for targeted gene disruption
Rossbach et al. (2017)	28924058	Authors utilized the Calling Cards Ty5 retrotransposon insertion method to identify binding sites of cdc7kd, cdc7kdΔcterm and Gal4 transcription factor within the yeast genome.

DNA Replication, Recombination, and Repair

Reference	PMID	Description in JBrowse
Mao et al. (2017)	28912372	Map of N-methylpurine (NMP) lesion alkalation damage across the yeast genome

Categories: New Data

Disease Pages at SGD: Linking Yeast Genetics and Human Disease

December 22, 2018

SGD’s Disease Ontology page for neurodegenerative disease

To promote the use of yeast as a catalyst for biomedical research, SGD utilizes the Disease Ontology (DO) to describe human diseases that are associated with yeast homologs. Disease Ontology annotations to yeast genes are now available through SGD’s new Disease pages. Each page corresponds to a Disease Ontology term, such as amyotrophic lateral sclerosis, and lists out all yeast genes annotated to the term by SGD.

Yeast genes with one or more human disease associations will also have a new Disease Summary tab (example: MIP1), accessible from the genes’ respective locus pages. The Disease summary tab shows all manually curated, high-throughput, and computational disease annotations for the yeast gene. Additionally, these pages feature a network diagram that depicts shared disease annotations for other yeast genes and their human homologs.

The shared disease annotations diagram for MIP1

For more information, check out SGD’s Disease Ontology help page. Explore the new Disease pages and features, and be sure to let us know if you have any feedback or questions.

Categories: New Data

Macromolecular Complex Pages Now Available

December 14, 2018

The GAL3-GAL80 transcription regulation complex page

Macromolecular complexes, already retrievable from SGD’s YeastMine data warehouse, are now available on new pages on the SGD website. These new Complex pages (example: GAL3-GAL80 complex) provide manually curated information about the complex as well as helpful links and diagrams. Key features of Complex pages include:

Manually curated summaries of the complex’s function and biology
A list of all known subunits and other complex participants
A Complex Diagram that shows the physical interactions between each subunit
Gene Ontology (GO) terms annotated to the complex
Images of complex structure from the Protein Data Bank (PDB), if available
A network diagram that shows how the complex relates to other complexes in terms of function and shared subunits

Complex pages can be accessed by running a search for the complex, or by visiting the gene summary pages of its subunits. For example, to find the GAL3-GAL80 complex page, simply run a search for “GAL3-GAL80” and click on the Complexes category (symbolized by the gold dot). Or, go to the GAL3 or GAL80 gene page and locate the Complex section.

SGD curated these macromolecular complex data in collaboration with curators at EMBL-EBI’s Complex Portal. Be sure to check out the page for your favorite complex, and let us know if you have any feedback or questions.

Categories: New Data

Out of China: Changing our Views on the Origins of Budding Yeast

April 17, 2018

1,011. That’s the number of different Saccharomyces cerevisiae yeast strains that were whole-genome sequenced and phenotyped by a team of researchers jointly led by Joseph Schacherer and Gianni Liti, published this week in Nature (Peter et al., 2018; data at: http://bit.ly/1011genomes-DataAtSGD).

1011genomes_FigS1bPie_chart_PaperVersion-crop

Ecological origins of the 1,011 isolates (from Peter et al., 2018; Creative Commons license)

Scrupulously gathering isolates of S. cerevisiae from as many diverse geographical locations and ecological niches as possible, the authors and their collaborators plucked yeast cells not only from the familiar wine, beer and bread sources, but also from rotting bananas, sea water, human blood, sewage, termite mounds, and more. The authors then surveyed the evolutionary relationships among the strains to describe the worldwide population distribution of this species and deduce its historical spread.

They found that the greatest amount of genome sequence diversity existed among the S. cerevisiae strains collected from Taiwan, mainland China, and other regions of East Asia. This means that in all likelihood the geographic origin of S. cerevisiae lies somewhere in East Asia. According to the authors, our budding yeast friend began spreading around the globe about 15,000 years ago, undergoing several independent domestication events during its worldwide journey. For example, it turns out that wine yeast and sake yeast were domesticated from different ancestors, thousands of years apart from each other. Whereas genomic markers of domestication appeared about 4,000 years ago in sake yeast, such markers appeared in wine yeast only 1,500 years ago.

Additionally — and similar to the situation where human interspecific hybridization with Neanderthals occurred only after humans migrated out of Africa — it appears that S. cerevisiae has inter-bred very frequently with other Saccharomyces species, especially S. paradoxus, but that most of these interspecific hybridization events occurred after the out-of-China dispersal.

There are many more gems to be found among the treasure trove of information in this paper. Some notable conclusions from the authors include: diploids are the most fit ploidy; copy number variation (CNV) is the most prevalent type of variation; most single nucleotide polymorphisms (SNPs) are very rare alleles in the population; extensive loss of heterozygosity is observed among many strains. There are also phenotype results (fitness values) for 971 strains across 36 different growth conditions.

As is often the case for yeast, the ability to sequence and analyze whole genomes at very deep coverage has yielded broad insights on eukaryotic genome evolution. The team’s work highlights this by presenting a comprehensive view of genome evolution on many different levels (e.g., differences in ploidy, aneuploidy, genetic variants, hybridization, and introgressions) that is difficult to obtain at the same scale and accuracy for other eukaryotic organisms.

SGD is happy to announce that in conjunction with the authors and publishers, we are hosting the datasets from the paper at this SGD download site. These datasets include: the actual genome sequences of the 1,011 isolates; the list of 4,940 common “core” ORFs plus 2,856 ORFs that are variable within the population (together these make up the “pangenome”); copy number variation (CNV) data; phenotyping data for 36 conditions; SNPs and indels relative to the S288C genome; and much more. We hope that the easy availability of these large datasets will be useful to many yeast (and non-yeast) researchers, and as the authors say, will help to “guide future population genomics and genotype–phenotype studies in this classic model system.”

Categories: Announcements, New Data

Tags: evolution, genome wide association study, Saccharomyces cerevisiae, strains

New Protein Half-life Data in SGD and YeastMine

September 08, 2016

Protein turnover for budding and fission yeast proteins, and scatterplot comparing homologous protein half-lives. Image from Cell Reports via Creative Commons license.

Ever wonder how quickly your favorite protein turns over within the cell? SGD has just incorporated half-life data for 3700 yeast proteins from a paper by Christiano et al., 2014. In this study, Christiano and colleagues pulse labeled exponentially growing wild type yeast cells in synthetic medium with a heavy lysine isotope (pulse SILAC), and followed the decay of native untagged proteins using high-resolution mass spectrometry based proteomics. The data generated in this study can be accessed by viewing the Experimental Data section of the Protein tab for your favorite gene, such as the short-lived Ctk1p or the long-lived Rsc1p.

In addition, you can retrieve this half-life data using YeastMine for one or more proteins with the Gene–>Protein Half-life template or obtain a list of proteins with half lives within a given range using the Retrieve–>Proteins with half-life in a given range template. Both of these templates can be found in the “Templates” section of YeastMine under the “Protein” category.

Thanks to Romaine Christiano and Tobias Walther for their help integrating this information into SGD.

Categories: New Data