SGD

SGD Help: Fungal BLAST Search


Contents



Description

BLAST, whose acronym stands for Basic Local Alignment Search Tool, was developed by Altschul et al. (1990). It is a very fast search algorithm that is used to separately search protein or DNA sequence databases. BLAST is best used for sequence similarity searching, rather than for motif searching.

A fairly complete online guide to BLAST searching can be found at the NCBI BLAST Help Manual.

The Fungal BLAST search offered by SGD allow users to compare any query nucleotide or protein sequence to fungal nucleotide or protein sequence datasets gathered from GenBank. S. cerevisiae sequences are included in the Fungal BLAST dataset, but may also be searched separately using SGD's S. cerevisiae BLAST tool, FASTA tool, or PatMatch tool (for searches using a query sequence of fewer than twenty residues).

Results of BLAST searches are returned via web browser. There is a separate help document for the Fungal BLAST results page.

Using the Fungal BLAST Search

Submitting query sequences

Sequences can be submitted for a BLAST search in two different ways. The sequence can be uploaded from a local text file with FASTA, GCG, or RAW formatting, or the sequence can be typed or pasted into the Query Sequence window. (Note: The contents of an uploaded sequence file will not be displayed in the Query Sequence window of the search page.)

To use the Upload Local File option:

Choosing a BLAST program

The Fungal BLAST search offers four BLAST programs to accommodate different types of searches:

  1. BLASTN compares a nucleotide query sequence against a nucleotide sequence dataset;
  2. BLASTP compares an amino acid query sequence against a protein sequence dataset;
  3. BLASTX compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence dataset;
  4. TBLASTN compares a protein query sequence against a nucleotide sequence dataset dynamically translated in all six reading frames (both strands).

When the user chooses BLASTN or TBLASTN, the tool automatically selects a nucleotide sequence dataset to search; when BLASTP or BLASTX are chosen, the tool selects a protein sequence dataset.

Note that for some species, only partial protein sequence datasets are available. For these species, a BLASTP or BLASTX search may result in a false negative result. If a BLASTP or BLASTX search returns no or few hits, it is advisable to repeat the search using BLASTN or TBLASTN (see next section for more details).

Choosing a sequence dataset(s)

Single or multiple species may be selected in the sequence selection box of the Fungal BLAST form, by Control-clicking (PC) or Command-clicking (Mac). Clicking on the other category labels, i.e., "Ascomycetes", "Other Public Sequences", etc., selects all sequences in that category.

Currently, the Fungal BLAST search encompasses all fungal sequences in GenBank; we expect to add fungal datasets to it as they are added to GenBank. Sequences are updated every two months, and the last update date is displayed at the top of the BLAST form. Datasets are organized by fungal phylum. The category "Fungal Sequences Not in the above datasets" includes all fungal sequences in GenBank, minus the specific datasets listed above.

Protein sequence datasets are generated based on GenBank records. For some fungal species, including all Saccharomyces species other than cerevisiae, protein predictions based on genome sequencing projects have not been deposited in GenBank (as of early 2007). Therefore, although these protein predictions may be available via other tools or locations (e.g., sequencers' websites), they are not included in the Fungal BLAST. Queries that use BLASTP or BLASTX against these species will return results that consist of only the few genes whose sequences have been determined and deposited into GenBank individually, rather than those predicted from a complete genome-wide analysis. In these cases BLASTN or TBLASTN programs, which query nucleotide sequences, will yield more comprehensive, informative results.

Interpreting the results

If a BLAST search results in no, or few, matches, the user may try to increase the number of matches in a number of ways. Going back to the BLAST search page, one can change the datasets searched, change the protein comparison matrix, or increase the number of alignments shown. The choice of nucleotide or protein sequence dataset may also affect the results for some species (see the preceding section of this document).

Changing other options can also change the outcome of the BLAST search. The Expect threshold ("E") reflects the number of matches expected to be found by chance. If the statistical significance of a match is greater than the Expect threshold, the match will not be reported. The E threshold default is set to 10. Decreasing the E threshold will increase the stringency of the search: fewer matches will be reported. On the other hand, increasing the E threshold will decrease the stringency of the search and result in more matches being reported. If a query sequence is short (fewer than about 30 residues), the user will want to adjust the Cutoff Score ("S") to a lower value, which will result in a less stringent criterion for reporting matches. The user can also change the word length (W): BLAST first searches for a perfect match of at least the word length. Once a match is found then it tries to extend the high-scoring segment pair (HSP). The default W value for BLASTN is 11; for all other programs the default is 3. If the word length is less than 11 the query sequence must be less than 5000 bp.

BLAST searches are also subject to filtering. A filter will remove repetitous sequences from a query, so that the results of the BLAST search will be less numerous and, ideally, more informative. For nucleic acid query sequences, the "dust" filter is used as the default. For all other searches, the "seg" filter is the default. You can always use the "Filter options" pull-down menu to select a different filter option or to remove filtering entirely (select "none").

Accessing the Fungal BLAST Search Page

The Fungal BLAST search page can be accessed through links on the Analysis & Tools and Homology & Comparisons contents pages. Additionally, the search is available from the Comparison Resources pull-down menu on each Locus page. When the Fungal BLAST search is accessed through an SGD Locus page, the gene or protein sequence of that locus will be entered automatically into the query sequence box.

Other Relevant Links

  1. Links within SGD
    1. FASTA Search Page
    2. PatMatch Search Page
    3. Gene/Sequence Resources
    4. Analysis & Tools contents page
    5. help document for the Fungal BLAST results page
  2. External links
    1. GenBank
    2. NCBI

Associated Glossary Terms:

Go to the Fungal BLAST Search interface.


Return to Saccharomyces Genome Database Send a Message to the SGD Curators