SGD Help: Align Strain Sequences

The S. cerevisiae Strain Sequence Alignment page displays protein or DNA sequences from a collection of Saccharomyces cerevisiae strain genomes. The alignments were produced using ClustalW.


  1. Dendrogram
  2. Alignment Options
  3. Color Key
  4. Sequence Alignment
  5. Download Option
  6. Individual Sequences


The dendrogram illustrates the similarity of the aligned sequences. If the common regions of the aligned sequences are identical, then the dendrogram cannot be generated and is not included.

Alignment Options

The default alignment is a Protein alignment of all available strain sequences. The pulldown menu immediately beneath the dendrogram allows for the selection of DNA Coding sequence alignments.

Color Key

The alignments are color-coded to indicate degree of sequence similarity at each position: 100% identical = yellow, 90-99% identical = pink, 75-89% identical = green, <75% identical = gray. Only areas of sequence that are present are included in the calculations for each position. Regions of truncated sequence are not taken into account, and therefore do not affect the calculations.

Sequence Alignment

Sequences from all available strains are aligned in 50-residue blocks; if a strain does not appear in the alignment, it is because the particular gene or protein sequence is not available from that strain. Each row is labeled on the left hand side of the page in the format [ORF]_[strain]. Clicking on the hyperlinked label will open a new browser window, which allows retrieval of the sequence in FASTA or GCG format.

Download Option

A file providing all sequences in the alignment, in FASTA format, can be downloaded by clicking on the "DOWNLOAD" link on the left side of the page immediately beneath the alignment (above the GCG-formatted individual sequences).

Individual Sequences

Individual protein or DNA sequences for each strain are provided at the bottom of the page. These sequences are in GCG format and are of the sequence type selected for alignment (Protein or DNA Coding). As mentioned above, individual sequences can also be retrieved (for Protein or DNA Coding) in both GCG or FASTA formats by clicking the hyperlinked labels to the left of the Sequence Alignment.

Go to Align Strain Sequences