SGD Help: FASTA Searches
Contents
FASTA is a program that allows one to
compare a query sequence to either
a protein or a DNA sequence database (Pearson and Lipman, 1988). FASTA uses a fast search to
initially identify sequences with a high degree of similarity to the
query sequence and then conducts a second comparison on the selected
sequences. FASTA is slower than BLAST, but
it can be more sensitive because it tolerates gaps in the aligned sequences.
NOTE: when a nucleotide query sequence is entered, FASTA searches both the query sequence and the reverse complement.
SGD also offers a selection of sequence databases that can be
searched, depending on the user's requirements.
- GenBank is the subset of DNA
sequences submitted to GenBank that have been derived from
S. cerevisiae DNA. It includes results of the systematic
sequencing as well as results from individual laboratories.
- genoSc is the complete
Saccharomyces cerevisiae genome sequence as revealed by the
international systematic sequencing effort.
- ORF-Coding consists of the
DNA sequences of all the yeast coding sequences (ORFs) as defined by
the systematic sequencing effort. This dataset contains the stop
codons, but does not contain any introns.
- NotFeature includes the
portion of the systematic sequence that is not an ORF, centromere,
tRNA, RNA gene, or Ty element.
- utr5_sc_500 is a dataset that contains the DNA sequences that are
500 bp upstream of all defined ORFs defined in the systematic
S. cerevisiae genomic sequence.
- utr5_sc_1000 is a dataset that contains the DNA sequences that are
1000 bp upstream of all defined ORFs defined in the systematic
S. cerevisiae genomic sequence.
- utr5_sc_2000 is a dataset that contains the DNA sequences that are
2000 bp upstream of all defined ORFs defined in the systematic
S. cerevisiae genomic sequence.
- ORF-Trans is a dataset
containing protein translations of all the ORFs defined in the
systematic S. cerevisiae genomic sequencing project.
- NRSC is a non-redundant set of
S. cerevisiae proteins. For example, while there may be 10
individual DNA sequences or protein sequences for a particular gene,
it will only be represented once in the NRSC.
If the FASTA
search results in no, or very few, matches, the user may try to
increase the number of matches in a number of ways. When using a
nucleotide sequence as the query sequence, always remember to perform
the search again, selecting the reverse-complement option near the
bottom of the FASTA search form. Check to make sure that the
Reverse-complement option was selected in cases where the query
sequence is a nucleotide sequence. From the FASTA search page, one
can change the database searched (for example, from genoSc to
GenBank), change the protein comparison matrix, or increase the
"Expected Number of Matches." Using a BLAST search
will sometimes give different results.
The FASTA
Search Page can be accessed by selecting the hypertext link on the
menu bar at the top of most SGD WWW pages.
- Links within SGD
- BLAST Search
- External links
- GenBank
Go to FASTA