New & Noteworthy

Yeast Biochemical Pathways incorporated into Gene Ontology annotations

April 09, 2025

YeastPathways, the database of metabolic pathways and enzymes in the budding yeast Saccharomyces cerevisiae, is manually curated and maintained by the biocuration team at SGD.

This resource is jam-packed with information, but somewhat hidden from view. We have been doing different things recently to make the pathways more readily accessible. Some time ago we added a new section with pathways links on the relevant gene pages (ex. DFR1).

We also made the pathways available in SGD Search.

Now we have transformed the metabolic pathways and associated genes/enzymes into Gene Ontology (GO) annotations (ex. DFR1).

Because many fundamental molecular processes and pathways are evolutionarily conserved between yeast and higher eukaryotes, including humans, the curated metabolic pathway information has great value for the transfer of knowledge to other organisms. It is for this reason that the YeastPathways data were exported in BioPAX (Demir et al. 2010) format for import into Noctua, a tool for collaborative curation of biological pathways and gene annotations that was developed by the GO Consortium (Thomas et al. 2019). BioPAX provides a standardized format for representing biological pathways, allowing researchers to integrate pathway information from different sources and databases. Noctua can import pathway data encoded in BioPAX format to populate the pathway editor with molecular interactions, biological processes, and regulatory relationships, and can utilize BioPAX files to combine pathway data from multiple datasets for pathway curation and analysis.

Pathways curated and edited in Noctua can be exported both as GO annotations for yeast and orthologous genes in other species, or as pathway annotations in BioPAX, which facilitates the sharing of curated pathways with other researchers, databases, and pathway analysis tools using a standard format, promoting data exchange, and collaboration within the scientific community.

Categories: Data updates

Reference Genome Annotation Update R64.5

June 19, 2024

The S. cerevisiae strain S288C reference genome annotation was updated. The new genome annotation is release R64.5.1, dated 2024-05-29. Note that the underlying genome sequence itself was not altered. The chromosome sequences remain stable and unchanged.

R64.5 Annotation update summary

This annotation update included (details in table below):

new ORFs: YDL204W-A, YFR035W-A, YGR016C-A, YMR106W-A, YNL040C-A, YNL155C-A
new uORFs for existing ORFs: ATG12/YBR217W, ATG19/YOL082W, ATG5/YPL149W, ATG13/YPR185W
move start downstream: EFM4/YIL064W
ORF upgraded from Dubious to Verified: YIL059C

R64.5 Annotation update details

Chr	Feature	Description of change	Reference
II	ATG12/YBR217W	New uORF chrII:657824..657835, partially overlaps CDS	Yang Y, et al. (2023) PMID:35363116
IV	YDL204W-A	New ORF chrIV:94133..94285	Wacholder A, et al. (2023) PMID:37164009
VI	YFR035W-A	New ORF chrVI:226260..226550	Wacholder A and Carvunis AR (2023) PMID:38048358
VII	YGR016C-A	New ORF chrVII:523353..523246	Wacholder A, et al. (2023) PMID:37164009, Chang S, et al. (2023) PMID:37927910
IX	EFM4/YIL064W	Move start 84 nucleotides downstream, new coordinates chrIX:242027..242716	Hamey JJ, et al. (2024)PMID:38199565
IX	YIL059C	Change ORF qualifier from Dubious to Verified because stable translation product detected	Wacholder A and Carvunis AR (2023) PMID:38048358
XIII	YMR106W-A	New ORF chrXIII:480924..481187	Wacholder A and Carvunis AR (2023) PMID:38048358
XIV	YNL040C-A	New ORF chrXIV:552558..552478	Wacholder A, et al. (2023) PMID:37164009
XIV	YNL155C-A	New ORF chrXIV:342135..341911	Wacholder A and Carvunis AR (2023) PMID:38048358
XV	ATG19/YOL082W	New uORF chrXV:168632..168679	Yang Y, et al. (2023) PMID:35363116
XVI	ATG5/YPL149W	4 new uORFs: chrXVI:271236..271277, chrXVI:271252..271302, chrXVI:271299..271307, chrXVI:271302..271307	Yang Y, et al. (2023) PMID:35363116
XVI	ATG13/YPR185W	New uORF chrXVI:907211..907351, partially overlaps CDS	Yang Y, et al. (2023) PMID:35363116

Categories: Data updates

Changes to Saccharomyces cerevisiae GFF3 file

March 01, 2024

The saccharomyces_cerevisiae.gff contains sequence features of Saccharomyces cerevisiae and related information such as Locus descriptions and GO annotations. It is fully compatible with Generic Feature Format Version 3. It is updated weekly.

After November 2020, SGD updated the transcripts in the GFF file to reflect the experimentally determined transcripts (Pelechano et al. 2013, Ng et al. 2020), when possible. The longest transcripts were determined for two different growth media – galactose and dextrose. When available, experimentally determined transcripts for one or both conditions were added for a gene. When this data was absent, transcripts matching the start and stop coordinates of an open reading frame (ORF) were used.

Old version: BDH2/YAL061W with longest transcripts expressed in GAL and in YPD.

Beginning in February 2024, SGD increased the start and stop coordinates of genes to encompass the start and stop coordinates of the longest experimentally determined transcripts, regardless of condition. This change was made in order to comply with JBrowse 2, a newer and more extensible genome browser, which requires that parent features in GFF files (genes) are larger than child features (mRNA, CDS, etc) (Diesh et al., 2023).

After February 2024: BDH2/YAL061W with increased start/stop coordinates.

This is a standard format used by many groups. SGD uses the GFF file to load the reference tracks in SGD’s genome browser resource.

Categories: Announcements, Data updates

Tags: biology, blog, genetics, news, Saccharomyces cerevisiae

Allele SGDIDs added to YeastMine

September 28, 2023

YeastMine is SGD’s data warehouse, powered by InterMine. We have so many templates (i.e., pre-defined queries) that provide access to so many different kinds of data!

A big area of focus for SGD and the yeast community is alleles. Alleles are different versions of genes that vary in DNA and sometimes protein sequence. Did you know that you can easily and quickly get all curated yeast allele data directly from YeastMine?

From the YeastMine home page, click ‘Templates‘ at top left. From there, filter for ‘allele’.

The Genes -> Alleles template returns data for one gene or a list of genes or the entire genome! Data include standard and systematic names for genes, gene name descriptions, allele names and descriptions, allele types, aliases, and references. SGDIDs for genes are included, and now SGDIDs for the alleles have been added. Previously, this query returned all of these data without the SGDIDs for the alleles. Based on user feedback, we have now made these allele SGDIDs available, so that they can be used to identify and distinguish different alleles. Enjoy!

There are thousands of alleles in SGD! Give the YeastMine Genes -> Alleles template a whirl! Get all the alleles for your favorite gene or list of genes.

For help using YeastMine, please see the SGD Help Pages and YouTube Channel.

Categories: Data updates, Website changes

Downloads files added to YeastMine

September 20, 2023

Back in the day, SGD maintained an FTP site to distribute data in various files. More recently, you have found these files in the SGD Downloads site. We have now moved these files to YeastMine:

From the YeastMine homepage, click Templates at top left. In the Filter, select ‘Downloads’ to constrain the list of templates.

The following templates are listed under Downloads:

• Deleted Merged Features: Retrieve all deleted and merged features.

• Retrieve Functional Complementation for genes: For gene(s), retrieve information about cross-species functional complementation between yeast and another species.

• Retrieve GO Terms: Retrieve GO Terms, including name, ID, namespace, and definition.

• Retrieve SGD chromosomal Features: Retrieve genes and other chromosomal features, including IDs, coordinates, and descriptions.

• Retrieve all cross-references for all genes: Retrieve IDs for yeast gene and gene products in other databases.

• Retrieve all domains of all genes: Retrieve Proteins/Genes that have a given domain.

• Retrieve all interactions for all genes: Retrieve physical and genetic interactions for all genes.

• Retrieve all pathways for all genes: Retrieve all metabolic pathways for all genes.

• Retrieve protein properties of all proteins of ORFs: Retrieve protein properties, including pI, molecular weight, N-terminal and C-terminal sequences, codon bias, etc. of all proteins.

For help using YeastMine, please see the SGD Help Pages and YouTube Channel.

Categories: Data updates, Tutorial, Website changes

Reference Genome Annotation Update R64.4

September 08, 2023

The S. cerevisiae strain S288C reference genome annotation was updated. The new genome annotation is release R64.4.1, dated 2023-08-23. Note that the underlying genome sequence itself was not altered in any way.

This annotation update included:

new uORFs for 3 ORFs:
8 new ncRNAs:
3 ORFs demoted from ‘Uncharacterized’ to ‘Dubious’ based on request from NCBI because they overlap tRNAs:

R64.4 Annotation update details

Chr	Feature	Description of change	Reference
III	SUT035/YNCC0015W	New ncRNA chrIII:205766..205942 (+ strand)	Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
IV	YDR278C	Change ORF qualifier from Uncharacterized to Dubious	Requested by NCBI
IV	SUT053/YNCD0033W	New ncRNA chrIV:506334..507774 (+ strand)	Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
IV	SUT468/YNCD0034C	New ncRNA chrIV:506546..507450 (- strand)	Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
VII	SUT532/YNCG0047C	New ncRNA chrVII:17213..17709 (- strand)	Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
VII	SUT125/YNCG0048W	New ncRNA chrVII:650855..651159 (+ strand)	Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158, Feng MW, et al. (2022) PMID:36712349
VII	SUT126/YNCG0049W	New ncRNA chrVII:660087..661399 (+ strand)	Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
XII	FPS1/YLL043W	New uORF uORF2 3 codons chrXII:49924..49932 (+ strand) ATGCATTAA	Cartwright SP, et al. (2017) PMID:28279185
XIV	ACC1/YNR016C	New uORF 4 codons chrXIV:661704..661715 (- strand) ATGTGTTTATAA	Blank HM, et al. (2017) PMID:28057705
XIV	HOL1/YNR055C	New uORF 7 codons chrXIV:730381..730401 (- strand) ATGCTATTACTACCAAGTTGA	Vindu A, et al. (2021) PMID:34375581
XV	YOL013W-A	Change ORF qualifier from Uncharacterized to Dubious	Requested by NCBI
XVI	SUT390/YNCP0025W	New ncRNA chrXVI:52977..53465 (+ strand)	Xu Z, et al. (2009) PMID:19169243, Feng MW, et al. (2022) PMID:36712349
XVI	SUT418/YNCP0026W	New ncRNA chrXVI:588998..589830 (+ strand)	Xu Z, et al. (2009) PMID:19169243, Feng MW, et al. (2022) PMID:36712349
XVI	YPR108W-A	Change ORF qualifier from Uncharacterized to Dubious	Requested by NCBI

Various sequence and annotation files are available on SGD’s Downloads site.

Categories: Data updates

Tags: genome annotation update, Saccharomyces cerevisiae

Predicted 3D Structures of Yeast Complexes

January 20, 2022

In an exciting new paper, Humphreys et al. describe the use of deep-learning-based algorithms to predict structures of not only single proteins, but assemblies of proteins. The team used rapid RoseTTAFold combined with the more accurate AlphaFold to build structural models for 106 previously unidentified protein assemblies and 806 complexes that had not been structurally characterized. The complexes have up to five subunits and are involved in numerous critical roles in cell biology.

Examples of predicted complexes from Humphreys et al.

Go look for your own proteins of interest at the ModelArchive and search in the Home page. Also find the link on the resources section of the SGD Interaction and Protein pages.

Categories: Announcements, Data updates, Paper of the Week

Tags: protein complex, Saccharomyces cerevisiae, yeast protein assembly

Protein Complex Page Updates

December 01, 2021

SGD has updated our protein complex pages to have the same format as gene pages, with tabs across the top for each category of information, including a Summary page, a new Gene Ontology page, and a new Literature page for each complex. Just as we do for all of your favorite genes, Gene Ontology and Literature curation for complexes will be ongoing.

If you have any questions or feedback about the updates to our complex pages, please do not hesitate to contact us at any time.

Categories: Announcements, Data updates, Website changes

Tags: protein complex, Saccharomyces cerevisiae

New links to AlphaFold 3D Predicted Protein Structure Database

November 09, 2021

Would you like to see the shape of your protein?

SGD now contains links to AlphaFold in the Resources section of the Summary, Protein and Homology pages for every gene.

The links through SGD give quick access to EMBL–European Bioinformatics Institute‘s new, highly accurate tool for predicting protein structure.
Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.
The predicted domains can then be compared to known protein structures (using a tool such as PDBeFold to seek matches to characterized protein families).
Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.

Categories: Data updates

Tags: AlphaFold, new tools

Updates to legacy gene names

November 05, 2021

SGD has long been the keeper of the official Saccharomyces cerevisiae gene nomenclature. Robert Mortimer handed over this responsibility to SGD in 1993 after maintaining the yeast genetic map and gene nomenclature for 30 years.

The accepted format for gene names in S. cerevisiae comprises three uppercase letters followed by a number. The letters typically signify a phrase (referred to as the “Name Description” in SGD) that provides information about a function, mutant phenotype, or process related to that gene, for example “ADE” for “ADEnine biosynthesis” or “CDC” for “Cell Division Cycle”. Gene names for many types of chromosomal features follow this basic format regardless of the type of feature named, whether an ORF, a tRNA, another type of non-coding RNA, an ARS, or a genetic locus. Some S. cerevisiae gene names that pre-date the current nomenclature standards do not conform to this format, such as MRLP38, RPL1A, and OM45.

A few historical gene names predate both the nomenclature standards and the database, and were less computer-friendly than more recent gene names, due to the presence of punctuation. SGD recently updated these gene names to be consistent with current standards and to be more software-friendly by removing punctuation. The old names for these four genes have been retained as aliases.

ORF	Old gene name	New gene name
YGL234W	ADE5,7	ADE57
YER069W	ARG5,6	ARG56
YBR208C	DUR1,2	DUR12
YIL154C	IMP2′	IMP21

Categories: Announcements, Data updates

Tags: gene nomenclature