New & Noteworthy

Changes to Saccharomyces cerevisiae GFF3 file

March 01, 2024

The saccharomyces_cerevisiae.gff contains sequence features of Saccharomyces cerevisiae and related information such as Locus descriptions and GO annotations. It is fully compatible with Generic Feature Format Version 3. It is updated weekly.

After November 2020, SGD updated the transcripts in the GFF file to reflect the experimentally determined transcripts (Pelechano et al. 2013, Ng et al. 2020), when possible. The longest transcripts were determined for two different growth media – galactose and dextrose. When available, experimentally determined transcripts for one or both conditions were added for a gene. When this data was absent, transcripts matching the start and stop coordinates of an open reading frame (ORF) were used. 

Old version: BDH2/YAL061W with longest transcripts expressed in GAL and in YPD.

Beginning in February 2024, SGD increased the start and stop coordinates of genes to encompass the start and stop coordinates of the longest experimentally determined transcripts, regardless of condition.  This change was made in order to comply with JBrowse 2, a newer and more extensible genome browser, which requires that parent features in GFF files (genes) are larger than child features (mRNA, CDS, etc) (Diesh et al., 2023). 

After February 2024: BDH2/YAL061W with increased start/stop coordinates.

This is a standard format used by many groups. SGD uses the GFF file to load the reference tracks in SGD’s genome browser resource.

Categories: Announcements, Data updates

Tags: biology, blog, genetics, news, Saccharomyces cerevisiae

Allele SGDIDs added to YeastMine

September 28, 2023

YeastMine is SGD’s data warehouse, powered by InterMine. We have so many templates (i.e., pre-defined queries) that provide access to so many different kinds of data!

A big area of focus for SGD and the yeast community is alleles. Alleles are different versions of genes that vary in DNA and sometimes protein sequence. Did you know that you can easily and quickly get all curated yeast allele data directly from YeastMine?

From the YeastMine home page, click ‘Templates‘ at top left. From there, filter for ‘allele’.

The Genes -> Alleles template returns data for one gene or a list of genes or the entire genome! Data include standard and systematic names for genes, gene name descriptions, allele names and descriptions, allele types, aliases, and references. SGDIDs for genes are included, and now SGDIDs for the alleles have been added. Previously, this query returned all of these data without the SGDIDs for the alleles. Based on user feedback, we have now made these allele SGDIDs available, so that they can be used to identify and distinguish different alleles. Enjoy!

There are thousands of alleles in SGD! Give the YeastMine Genes -> Alleles template a whirl! Get all the alleles for your favorite gene or list of genes.

For help using YeastMine, please see the SGD Help Pages and YouTube Channel.

Categories: Data updates, Website changes

Downloads files added to YeastMine

September 20, 2023

Back in the day, SGD maintained an FTP site to distribute data in various files. More recently, you have found these files in the SGD Downloads site. We have now moved these files to YeastMine:

From the YeastMine homepage, click Templates at top left. In the Filter, select ‘Downloads’ to constrain the list of templates.

The following templates are listed under Downloads:

Deleted Merged Features: Retrieve all deleted and merged features.

Retrieve Functional Complementation for genes: For gene(s), retrieve information about cross-species functional complementation between yeast and another species.

Retrieve GO Terms: Retrieve GO Terms, including name, ID, namespace, and definition.

Retrieve SGD chromosomal Features: Retrieve genes and other chromosomal features, including IDs, coordinates, and descriptions.

Retrieve all cross-references for all genes: Retrieve IDs for yeast gene and gene products in other databases.

Retrieve all domains of all genes: Retrieve Proteins/Genes that have a given domain.

Retrieve all interactions for all genes: Retrieve physical and genetic interactions for all genes.

Retrieve all pathways for all genes: Retrieve all metabolic pathways for all genes.

Retrieve protein properties of all proteins of ORFs: Retrieve protein properties, including pI, molecular weight, N-terminal and C-terminal sequences, codon bias, etc. of all proteins.

For help using YeastMine, please see the SGD Help Pages and YouTube Channel.

Categories: Data updates, Tutorial, Website changes

Reference Genome Annotation Update R64.4

September 08, 2023

The S. cerevisiae strain S288C reference genome annotation was updated. The new genome annotation is release R64.4.1, dated 2023-08-23. Note that the underlying genome sequence itself was not altered in any way.

This annotation update included:

Various sequence and annotation files are available on SGD’s Downloads site. You can find more update details on the Details of 2023 Reference Genome Annotation Update R64.4 SGD Wiki page. 

Categories: Data updates

Tags: genome annotation update, Saccharomyces cerevisiae

Predicted 3D Structures of Yeast Complexes

January 20, 2022

In an exciting new paper, Humphreys et al. describe the use of deep-learning-based algorithms to predict structures of not only single proteins, but assemblies of proteins. The team used rapid RoseTTAFold combined with the more accurate AlphaFold to build structural models for 106 previously unidentified protein assemblies and 806 complexes that had not been structurally characterized. The complexes have up to five subunits and are involved in numerous critical roles in cell biology.

Examples of predicted complexes from Humphreys et al.

Go look for your own proteins of interest at the ModelArchive and search in the Home page. Also find the link on the resources section of the SGD Interaction and Protein pages.

Categories: Announcements, Data updates, Paper of the Week

Tags: protein complex, Saccharomyces cerevisiae, yeast protein assembly

Protein Complex Page Updates

December 01, 2021

SGD has updated our protein complex pages to have the same format as gene pages, with tabs across the top for each category of information, including a Summary page, a new Gene Ontology page, and a new Literature page for each complex. Just as we do for all of your favorite genes, Gene Ontology and Literature curation for complexes will be ongoing.

Summary page and new Literature page

If you have any questions or feedback about the updates to our complex pages, please do not hesitate to contact us at any time.

Categories: Announcements, Data updates, Website changes

Tags: protein complex, Saccharomyces cerevisiae

New links to AlphaFold 3D Predicted Protein Structure Database

November 09, 2021

  • The links through SGD give quick access to EMBLEuropean Bioinformatics Institute‘s new, highly accurate tool for predicting protein structure.
  • Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.
  • The predicted domains can then be compared to known protein structures (using a tool such as PDBeFold to seek matches to characterized protein families).
  • Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.
Structure of Hog1p

Categories: Data updates

Tags: AlphaFold, new tools

Updates to legacy gene names

November 05, 2021

SGD has long been the keeper of the official Saccharomyces cerevisiae gene nomenclature. Robert Mortimer handed over this responsibility to SGD in 1993 after maintaining the yeast genetic map and gene nomenclature for 30 years. 

The accepted format for gene names in S. cerevisiae comprises three uppercase letters followed by a number. The letters typically signify a phrase (referred to as the “Name Description” in SGD) that provides information about a function, mutant phenotype, or process related to that gene, for example “ADE” for “ADEnine biosynthesis” or “CDC” for “Cell Division Cycle”. Gene names for many types of chromosomal features follow this basic format regardless of the type of feature named, whether an ORF, a tRNA, another type of non-coding RNA, an ARS, or a genetic locus. Some S. cerevisiae gene names that pre-date the current nomenclature standards do not conform to this format, such as MRLP38RPL1A, and OM45

A few historical gene names predate both the nomenclature standards and the database, and were less computer-friendly than more recent gene names, due to the presence of punctuation. SGD recently updated these gene names to be consistent with current standards and to be more software-friendly by removing punctuation. The old names for these four genes have been retained as aliases.

ORFOld gene nameNew gene name
YGL234WADE5,7ADE57
YER069WARG5,6ARG56
YBR208CDUR1,2DUR12
YIL154CIMP2′IMP21

Categories: Announcements, Data updates

Tags: gene nomenclature

Reference Genome Annotation Update R64.3

August 03, 2021

The S. cerevisiae strain S288C reference genome annotation was updated in its first major update since 2014. The new genome annotation is release R64.3, which released on April 21, 2021. Note that the underlying sequence of 16 assembled nuclear chromosomes, plus the mitochondrial genome, remained unchanged in annotation release R64.3.1 (relative to genome sequence release R64.2.1).

This annotation update included:

Various sequence and annotation files are available on SGD’s Downloads site. You can find more update details and read about the new systematic nomenclature system for noncoding RNA genes on the Details of 2021 Reference Genome Annotation Update R64.3 SGD Wiki page. 

Categories: Data updates

SGD Homology Data Now Available On New Homology Pages

March 25, 2021

SGD is excited to introduce our new Homology Pages! These pages can be accessed by clicking on the Homology tab in the header of SGD gene pages, as seen below.

The information displayed on the Homology Pages is divided into several sections:

  • Homologs: Information about known homologs for the gene of interest, such as the species of the homolog, the corresponding Gene ID from the Alliance of Genome Resources, and the name of the homolog.
  • Functional Complementation: Data about cross-species functional complementation between yeast and other species, curated by SGD and the Princeton Protein Orthology Database (P-POD).
  • Fungal Homologs: Curated homolog information for 24 additional species of fungi. View the species of the fungal homolog, the database source of the entry, and the Gene ID of the homolog from that database.
  • External Identifiers: A list of external identifiers for the protein from various database sources.

If you have any questions or feedback regarding our new Homology Pages, please do not hesitate to contact us at any time.

Categories: Data updates, Homologs, New Data, Yeast and Human Disease

Next