Take our Survey

SGD Help: Fungal Alignment

This resource displays Saccharomyces cerevisiae sequences aligned with predicted orthologs in several other fungi. See the Sequences: Sources and References section for a current list of fungal species.

Contents

  1. Display and Navigation
    1. Dendrogram
    2. Alignment Options
    3. Download Options
    4. Symbol Key
    5. Sequence Alignment
    6. Individual Sequences
  2. Alignment Methods
  3. Sequences: Sources and References
  4. Caveats

Display and Navigation

Dendrogram

The top section (boxed in blue) shows a dendrogram illustrating the similarity of the aligned sequences. Species are color-coded according to the source of the sequence. If the common regions of the aligned sequences are identical, then this dendrogram cannot be generated and will not be included.

Alignment Options

The Default Alignment

The default alignment is already displayed below the Symbol Key. When retrieving alignments from the "Fungal alignment" on the Locus page, the default is a protein alignment of the "Best Hits" or "Orthologs" from all available fungal species. When retrieving alignments from the synteny viewer, the default is a protein alignment of all species contributed by Kellis et al.

The Alignment Options section

The Alignment Options section (periwinkle background) allows for alignments that are different from the default alignments. To view a different alignment, you must select the species/sequences to be included as well as the the type of sequence.

    1. Select the species/sequences: There are two separate dialog boxes that display the names of sequences available for alignment. You must choose a minimum of two sequences for alignment, from either or both boxes, by pressing the Control (PC) or Command (Mac) key while clicking.
      • The Best Hits & Orthologs box lists the sequences that researchers have identified as orthologs or reciprocal best hits. Due to either incomplete sequence coverage or divergence between species, not all genes are available for all species. If a species or sequence is not shown, then it is not available. For advice on selecting species to compare, see the note on Saccharomyces groups, below (contributed by Mark Johnston).
      • The Other Hits box lists predicted ORFs that have a high degree of similarity to this S. cerevisiae ORF, but that were not identified as orthologs or reciprocal best hits with this or any other S. cerevisiae ORF. Note that not all closely related predicted ORFs are listed in this box. A complete list of similar predicted ORFs can be retrieved by using the Fungal Blast Tool.
    2. Select the type of alignment: Choose one of the following options by clicking on it.
      • Protein: This is the default option from both the synteny viewer and the locus page. It displays predicted translation products of spliced S. cerevisiae ORFs aligned with the predicted translation products of orthologous or highly similar ORFs from other fungi, when available. Ambiguous amino acids are indicated by an "X."
      • ORF DNA: This option displays the genomic DNA (i.e. introns have not been removed) of S. cerevisiae genes aligned with orthologous or highly similar predicted ORFs from other fungi, when available. Ambiguous nucleotides are indicated by an "N."
      • Upstream sequence: This option displays an alignment of the genomic DNA (i.e. introns have not been removed) of the 1 kb of sequence directly upstream of the orthologous or highly similar ORFs of interest. If a full 1 kb of sequence is not available (e.g. the ORF is near the end of a contig), all available sequence will be displayed. Ambiguous nucleotides are indicated by an "N."
      • Downstream sequence: This option displays an alignment of the genomic DNA (i.e. introns have not been removed) of the 1 kb of sequence directly downstream of the orthologous or highly similar ORFs of interest. If a full 1 kb of sequence is not available (e.g. the ORF is near the end of a contig), all available sequence will be displayed. Ambiguous nucleotides are indicated by an "N."
      • ORF DNA +1kb up/downstream: This option displays an alignment of the genomic DNA (i.e. introns have not been removed) of orthologous or highly similar ORFs of interest, along with 1 kb of sequence both up- and downstream. If a full 1 kb of sequence is not available (e.g. the ORF is near the end of a contig), all available sequence will be displayed. Ambiguous nucleotides are indicated by an "N."
    3. Click on the Align button to request alignment.

Download Options

There are two different download options available on this page. Both options provide sequences in FASTA format.

The Download button is located underneath the Align button (in the section with the periwinkle background). Clicking on this button will download selected sequences, from selected species, without aligning them. This option is provided as an alternative to aligning the sequences at SGD.

The FASTA button is located beneath the aligned sequences. Clicking this button will download the sequences included in the alignment, i.e. sequences that are currently displayed. This download option is provided as an alternative to the GCG format sequences, which are displayed at the bottom of the page.

Symbol Key

The alignments are color-coded to indicate degree of sequence similarity. They are also labeled with symbols (bottom row of alignment) indicating the degree of sequence similarity. The "strong" and "weak" similarity groups are determined using the Gonnet Pam250 matrix (strong similarity = score > 0.5; weak similarity = a positive score =< 0.5).

Color and Similarity identical strong similarity weak similarity
Symbol * : .
Conserved Amino Acid Groups exact matches only The conserved position contains amino acids from one of the "strong" groups listed below (each row is a group):
STA
NEQK
NHQK
NDEQ
QHRK
MILV
MILF
HY
FYW
The conserved position contains amino acids from one of the "weak" groups listed below (each row is a group):

CSA
ATV
SAG
STNK
STPA
SGND
SNDEQK
NDEQHK
NEQHRK
FVLIM
HFY

 

Sequence Alignment

Sequences from all available and selected fungi are aligned; if a species or fungal species does not appear in the alignment, then this gene or protein sequence is not available in that species of fungi (please also see the Caveats section). Each row is labeled on the left hand side of the page. Clicking on the hyperlinked label will take you to the appropriate sequence page, which allows retrieval of the sequence in either GCG or FASTA format. This label contains three important bits of information: source_species_ORF

source: At present we have sequences from three different sources: SGD, MIT, and Washington University (WashU). Some species are available in duplicate because they were provided independently by both the MIT and WashU groups.
species: The species is indicated with an abbreviated form of the species name. See the Sequences table for a key to the names.
ORF: This ORF name uniquely identifies the ORF within each species. The format of ORF identifiers vary by source:

  • SGD (S. cerevisiae) sequences are labeled with the standard and systematic names of the gene (e.g. ACT1/YFL039C).
  • MIT sequences are labeled with the contig number (e.g. c514) and a predicted ORF (pORF) number that is unique in each fungal species (e.g. 8148). Note that since the pORF numbers were independently assigned in each fungal species, there is no expectation that homologs will have the same pORF number.
  • WashU ORFs are labeled with a number that incorporates both the contig (number before the decimal) and the predicted ORF number (number after the decimal).

Aligned sequences are displayed in rows of 50 residues or gaps. The residues are numbered independently for each species, on either side of each line of sequence. Note that although gaps (indicated with a dash) take up space on the line, they do not contribute to the residue count.

Aligned sequences are are color-coded to indicate degree of sequence similarity. They are also labeled with symbols (bottom, "Symbol" row of alignment), indicating the degree of sequence similarity. Note that all available species must be present for the similarity indicators to appear; this means that if one of the aligned sequences is truncated relative to the others, the three remaining sequences will not be labeled to indicate similarity. See the Symbol Key section for a futher explanation of the similarity labeling.

Individual Sequences

Individual protein or DNA sequences for each fungal species are listed at the bottom of the page. The sequences in this section are in GCG format and are of the sequence type selected for alignment. A file listing all sequences selected for alignment, in FASTA format, can be downloaded by clicking on the "FASTA" button. Sequences can also be retrieved (in both GCG or FASTA formats, and in all available sequence types) by clicking on the hyperlinked label to the left of the Sequence Alignment.

Alignment Methods

 Various researchers (see Sequence sources section) predicted ORFs for other fungi and assigned orthology to S. cerevisiae genes based on sequence conservation and synteny. Predicted ORFs are included in each alignment, as specified by the researchers. Alignments are made on the fly, using ClustalW.

Sequences: Sources and References

Abbreviation Fungal Species Group Source of Sequence Published Reference
SGD_Scer Saccharomyces cerevisiae sensu stricto Current reference sequence stored in SGD  
MIT_Spar Saccharomyces paradoxus sensu stricto Manolis Kellis, Eric Lander, Bruce Birren and coworkers at the Whitehead Institute at MIT Kellis et al
MIT_Smik Saccharomyces mikatae sensu stricto Manolis Kellis, Eric Lander, Bruce Birren and coworkers at the Whitehead Institute at MIT Kellis et al
WashU_Smik Saccharomyces mikatae sensu stricto Paul Cliften, Mark Johnston and coworkers at Washington University Cliften et al
MIT_Sbay Saccharomyces bayanus sensu stricto Manolis Kellis, Eric Lander, Bruce Birren and coworkers at the Whitehead Institute at MIT Kellis et al
WashU_Sbay Saccharomyces bayanus sensu stricto Paul Cliften, Mark Johnston and coworkers at Washington University and Manolis Kellis, Eric Lander, Bruce Birren and coworkers at the Whitehead Institute at MIT (note: this WashU assembly combined published sequencing data from both the WashU and MIT sequencing efforts) Cliften et al | Kellis et al
WashU_Skud Saccharomyces kudriavzevii sensu stricto Paul Cliften, Mark Johnston and coworkers at Washington University Cliften et al
WashU_Scas Saccharomyces castellii sensu lato Paul Cliften, Mark Johnston and coworkers at Washington University Cliften et al
WashU_Sklu Saccharomyces kluyveri sensu lato Paul Cliften, Mark Johnston and coworkers at Washington University Cliften et al

 

Notes on the species and sources:

  1. There is significant sequence variation between the two different strains of S. mikatae sequenced by Washington University and MIT, making a joint assembly difficult.
  2. The Saccharomyces genome sequences available here for alignments fall into two groups, Saccharomyces sensu stricto vs. sensu latogroups (contributed by Mark Johnston):
    • the sensu stricto species (S. paradoxus, S. mikatae, S. kudriavzevii, S. bayanus) are relatively closely related to S. cerevisiae, and are best for identifying conserved sequences in non-protein coding sequence.
    • the sensu lato species (S. castellii and S. kluyveri) are more distantly related and are best for identifying conserved protein sequences.
  • If you wish to identify conserved non-protein coding sequences, we recommend aligning the S. cerevisiae query sequence with only the sensu stricto species sequences. The sensu lato sequences are so diverged from S. cerevisiae sequence that very few of their non-protein coding sequences will align to their S. cerevisiae ortholog.
  • If you wish to identify conserved sequences in proteins, we recommend starting by aligning the S. cerevisiae sequence to just the sensu lato species' sequences (S. castellii and S. kluyveri). In most cases this will best reveal the conserved sequences. Alignment to the sensu stricto species' sequences is not expected to provide much additional information in most cases.

Caveats

  1. The ORF predictions by Kellis et al and by Cliften et al did NOT consider introns. As a result, the orthologs of intron containing S. cerevisiae ORFs will often appear truncated or only partially aligned relative to the S. cerevisiae ORF. These results are simply a consequence of the methods used to predict ORFs in the other Saccharomyces species and do not indicate genuine divergence between S. cerevisiae and the other species.
  2. The analysis by Kellis et al considered only predicted ORFs that were at least 50 amino acids long and that did not contain an intron within the first 150 nucleotides of genomic DNA. There are many known S. cerevisiae ORFs that would not meet this requirement. As a consequence, there are no S. paradoxus, S. mikatae and S. bayanus orthologs (and few alignments) predicted for S. cerevisiae genes of this unusually small size. To see a list of genes that were excluded from the analysis for this reason, click here.

 

Go to Fungal Sequence Alignment