SGD Help: Fungal Sequence Alignment
Saccharomyces Genome Database

SGD Help: Fungal Sequence Alignment


Contents



Overview

This resource displays Saccharomyces cerevisiae sequences aligned with predicted orthologs in several other fungi. See the Sequences: Sources and References section for a current list.

Display and Navigation

Dendrogram: The top section (boxed in blue) shows a dendrogram illustrating the similarity of the aligned sequences. Species are color-coded according to the source of the sequence. If the common regions of the aligned sequences are identical, then this dendrogram cannot be generated and will not be included.

Alignment Options:

The default alignment: The default alignment is already displayed below the Symbol Key. When retrieving alignments from the "Fungal alignment" on the Locus page, the default is a protein alignment of the "Best Hits" or "Orthologs" from all available fungal species. When retrieving alignments from the synteny viewer, the default is a protein alignment of all species contributed by Kellis et al.

The Alignment Options section (periwinkle background) allows for alignments that are different from the default alignments. To view a different alignment, you must select the species/sequences to be included as well as the the type of sequence (see below). Click on the "align" button to request alignment.

Select the species/sequences: There are two separate dialog boxes that display the names of sequences available for alignment.
  1. The Best Hits & Orthologs box lists the sequences that researchers have identified as orthologs or reciprocal best hits. Due to either incomplete sequence coverage or divergence between species, not all genes are available for all species. If a species or sequence is not shown, then it is not available. For information on the evolutionary relationships between these species, see the phylogenetic tree at the Washington University site. For advice on selecting species to compare, see the note on Saccharomyces groups, below (contributed by Mark Johnston).

  2. The Other Hits box lists predicted ORFs that have a high degree of similarity to this S. cerevisiae ORF, but that were not identified as orthologs or reciprocal best hits with this or any other S. cerevisiae ORF. Note that not all closely related predicted ORFs are listed in this box. A complete list of similar predicted ORFs can be retrieved by using the Fungal Blast Tool.
Choose two or more species by pressing the Control (PC) or Command (Mac) key while clicking. You must choose a minimum of two sequences for alignment, from either or both boxes.

Select the type of alignment: Choose one of the following options by clicking on it.
Protein: This is the default option from both the synteny viewer and the locus page. It displays predicted translation products of spliced S. cerevisiae ORFs aligned with the predicted translation products of orthologous or highly similar ORFs from other fungi, when available. Ambiguous amino acids are indicated by an "X."
ORF DNA: This option displays the genomic DNA (i.e. introns have not been removed) of S. cerevisiae genes aligned with orthologous or highly similar predicted ORFs from other fungi, when available. Ambiguous nucleotides are indicated by an "N."
Upstream sequence: This option displays an alignment of the genomic DNA (i.e. introns have not been removed) of the 1 kb of sequence directly upstream of the orthologous or highly similar ORFs of interest. If a full 1 kb of sequence is not available (e.g. the ORF is near the end of a contig), all available sequence will be displayed. Ambiguous nucleotides are indicated by an "N."
Downstream sequence: This option displays an alignment of the genomic DNA (i.e. introns have not been removed) of the 1 kb of sequence directly downstream of the orthologous or highly similar ORFs of interest. If a full 1 kb of sequence is not available (e.g. the ORF is near the end of a contig), all available sequence will be displayed. Ambiguous nucleotides are indicated by an "N."
ORF DNA +1kb up/downstream: This option displays an alignment of the genomic DNA (i.e. introns have not been removed) of orthologous or highly similar ORFs of interest, along with 1 kb of sequence both up- and downstream. If a full 1 kb of sequence is not available (e.g. the ORF is near the end of a contig), all available sequence will be displayed. Ambiguous nucleotides are indicated by an "N."

Download Options: There are two different download options available on this page. Both options provide sequences in FASTA format.
  1. The "Download" button is located underneath the "Align" button. Clicking on this button will download selected sequences, from selected species, without aligning them. This option is provided as an alternative to aligning the sequences at SGD.

  2. The "FASTA" button is located beneath the aligned sequences. Clicking this button will download the sequences included in the alignment, i.e. sequences that are currently displayed. This download option is provided as an alternative to the GCG format sequences, which are displayed at the bottom of the page.

Symbol Key: The alignments are color-coded to indicate degree of sequence similarity. They are also labeled with symbols (bottom row of alignment) indicating the degree of sequence similarity. The "strong" and "weak" similarity groups are determined using the Gonnet Pam250 matrix (strong similarity = score > 0.5; weak similarity = a positive score =< 0.5).

Color and Similarity identical strong similarity weak similarity
Symbol * : .
Conserved Amino Acid Groups exact matches only The conserved position contains amino acids from one of the "strong" groups listed below (each row is a group):

                 STA
                 NEQK
                 NHQK
                 NDEQ
                 QHRK
                 MILV
                 MILF
                 HY
                 FYW
The conserved position contains amino acids from one of the "weak" groups listed below (each row is a group):

                 CSA
                 ATV
                 SAG
                 STNK
                 STPA
                 SGND
                 SNDEQK
                 NDEQHK
                 NEQHRK
                 FVLIM
                 HFY


Sequence Alignment: Sequences from all available and selected fungi are aligned; if a species or fungal species does not appear in the alignment, then this gene or protein sequence is not available in that species of fungi (please also see the Caveats section). Each row is labeled on the left hand side of the page. Clicking on the hyperlinked label will take you to the appropriate sequence page, which allows retrieval of the sequence in either GCG or FASTA format. This label contains three important bits of information: source_species_ORF
  1. source: At present we have sequences from three different sources: SGD, MIT, and Washington University (WashU). Some species are available in duplicate because they were provided independentlyby both the MIT and WashU groups.

  2. species: The species is indicated with an abbreviated form of the species name. See the Sequences table for a key to the names.

  3. ORF: This ORF name uniquely identifies the ORF within each species. The format of ORF identifiers vary by source:

Aligned sequences are displayed in rows of 50 residues or gaps. The residues are numbered independently for each species, on either side of each line of sequence. Note that although gaps (indicated with a dash) take up space on the line, they do not contribute to the residue count.

Aligned sequences are are color-coded to indicate degree of sequence similarity. They are also labeled with symbols (bottom, "Symbol" row of alignment), indicating the degree of sequence similarity. Note that all available species must be present for the similarity indicators to appear; this means that if one of the aligned sequences is truncated relative to the others, the three remaining sequences will not be labeled to indicate similarity. See the Symbol Key section for a futher explanation of the similarity labeling.

Individual Sequences: Individual protein or DNA sequences for each fungal species are listed at the bottom of the page. The sequences in this section are in GCG format and are of the sequence type selected for alignment. A file listing all sequences selected for alignment, in FASTA format, can be downloaded by clicking on the "FASTA" button. Sequences can also be retrieved (in both GCG or FASTA formats, and in all available sequence types) by clicking on the hyperlinked label to the left of the Sequence Alignment.

Alignment Methods

Various researchers (see Sequence sources section) predicted ORFs for other fungi and assigned orthology to S. cerevisiae genes based on sequence conservation and synteny. Predicted ORFs are included in each alignment, as specified by the researchers. Alignments are made on the fly, using ClustalW.

Sequences: Sources and References

Abbreviation Fungal Species Group Source of Sequence Published Reference
SGD_Scer Saccharomyces cerevisiae sensu stricto Current reference sequence stored in SGD  
MIT_Spar Saccharomyces paradoxus sensu stricto Manolis Kellis, Eric Lander, Bruce Birren and coworkers at the Whitehead Institute at MIT Kellis et al
MIT_Smik Saccharomyces mikatae sensu stricto Manolis Kellis, Eric Lander, Bruce Birren and coworkers at the Whitehead Institute at MIT Kellis et al
WashU_Smik Saccharomyces mikatae sensu stricto Paul Cliften, Mark Johnston and coworkers at Washington University Cliften et al
MIT_Sbay Saccharomyces bayanus sensu stricto Manolis Kellis, Eric Lander, Bruce Birren and coworkers at the Whitehead Institute at MIT Kellis et al
WashU_Sbay Saccharomyces bayanus sensu stricto Paul Cliften, Mark Johnston and coworkers at Washington University and Manolis Kellis, Eric Lander, Bruce Birren and coworkers at the Whitehead Institute at MIT (note: this WashU assembly combined published sequencing data from both the WashU and MIT sequencing efforts) Cliften et al | Kellis et al
WashU_Skud Saccharomyces kudriavzevii sensu stricto Paul Cliften, Mark Johnston and coworkers at Washington University Cliften et al
WashU_Scas Saccharomyces castellii sensu lato Paul Cliften, Mark Johnston and coworkers at Washington University Cliften et al
WashU_Sklu Saccharomyces kluyveri sensu lato Paul Cliften, Mark Johnston and coworkers at Washington University Cliften et al


Notes on the species and sources:

1. There is significant sequence variation between the two different strains of S. mikatae sequenced by Washington University and MIT, making a joint assembly difficult.

2. For an indication of the evolutionary relationships between these species, see the
phylogenetic tree provided by Washington University.

3. Saccharomyces sensu stricto vs. sensu lato groups (contributed by Mark Johnston):

The Saccharomyces genome sequences available here for alignments fall into two groups:

  1. the sensu stricto species (S. paradoxus, S. mikatae, S. kudriavzevii, S. bayanus) are relatively closely related to S. cerevisiae, and are best for identifying conserved sequences in non-protein coding sequence.

  2. the sensu lato species (S. castellii and S. kluyveri) are more distantly related and are best for identifying conserved protein sequences.

If you wish to identify conserved non-protein coding sequences, we recommend aligning the S. cerevisiae query sequence with only the sensu stricto species sequences. The sensu lato sequences are so diverged from S. cerevisiae sequence that very few of their non-protein coding sequences will align to their S. cerevisiae ortholog.

If you wish to identify conserved sequences in proteins, we recommend starting by aligning the S. cerevisiae sequence to just the sensu lato species' sequences (S. castellii and S. kluyveri). In most cases this will best reveal the conserved sequences. Alignment to the sensu stricto species' sequences is not expected to provide much additional information in most cases.

Caveats

1. The ORF predictions by Kellis et al and by Cliften et al did NOT consider introns. As a result, the orthologs of intron containing S. cerevisiae ORFs will often appear truncated or only partially aligned relative to the S. cerevisiae ORF. These results are simply a consequence of the methods used to predict ORFs in the other Saccharomyces species and do not indicate genuine divergence between S. cerevisiae and the other species.

2. The analysis by Kellis et al considered only predicted ORFs that were at least 50 amino acids long and that did not contain an intron within the first 150 nucleotides of genomic DNA. There are many known S. cerevisiae ORFs that would not meet this requirement. As a consequence, there are no S. paradoxus, S. mikatae and S. bayanus orthologs (and few alignments) predicted for S. cerevisiae genes of this unusually small size. To see a list of genes that were excluded from the analysis for this reason, click here.


Return to Saccharomyces Genome Database Send a Message to the SGD Curators