SGD

SGD Help: Pattern Matching


Contents



Description

PatMatch permits the identification of patterns or motifs within the collection of all S. cerevisiae protein or DNA sequences. The pattern can be either a simple string or a regular expression. Standard substitutions are allowed in the string, such as using "R" for any purine base when performing a nucleotide search. Pattern matching offers an alternative to sequence alignment techniques such as BLAST and FASTA for identifying nucleotide or peptide sequences with conserved or biologically interesting regions.

Using PatMatch

SGD offers a selection of sequence datasets that can be searched, depending on the user's requirements.

Tips for Pattern Matching:

  1. The pattern may be lowercase or uppercase. There is no maximum or minimum pattern size.
  2. A description of the allowed syntax of the pattern is provided at the bottom of the Pattern Matching page.
  3. The Strand option is used for restricting NUCLEOTIDE searches to only one strand of the specified dataset. The default is that both strands are searched. If the "Strand in dataset" option is chosen, then only the strand that is actually present in the dataset will be searched. In other words, if the chosen dataset is: Choosing "Reverse complement of strand in dataset" restricts the PatMatch search to the reverse complement of the strands described above.
    Please note that in the displayed sequence, only the Watson strand will be shown, regardless of which strand option is chosen. If your pattern has a match on the Crick strand, the reverse complement of the pattern will be highlighted in the Watson sequence.
  4. The Mismatch, Deletion or Insertion options will permit matches to sequences that contain a defined number of substitutions, deletions or insertions relative to the input pattern. This number can range from 1 to 3. At this time, patterns containing regular expressions do not support the mismatch, deletion and insertion options.
  5. When searching for patterns near the beginning or end of a sequence, bear in mind that nucleotide sequences will include the stop codon (TAA, TAG, or TGA) and start codon (5' ATG). Peptide sequence will include the initiator methionine, whether or not it is removed in vivo.
  6. If the genoSc, ORF-Coding, ORF-Genomic, ORF-Genomic-1000, ORF-Trans or NotFeature dataset is searched, both the Chromosome Graphic and Full Search Result Table are displayed. If the GenBank or NRSC datasets are used, only the Full Search Result Table is shown. The Chromosome Graphic displays all the hits in the 16 yeast chromosomes; the user may click on any region in any chromosome bar to go to the Features Map for viewing the hits. The Full Search Result Table lists the name of sequences containing a match, the hit number, matching pattern, matching position, the link to a DNA or Protein sequence and any information about the sequence. Matching position is given relative to the entire sequence matched (listed in the Sequence Name column); the sequence may be an entire chromosome, an ORF (DNA or amino acid sequence), or a region of untranslated DNA.
  7. The sequences with hits are listed in the table based on the number of the hits and sequence name.
  8. At this time, PatMatch will not find overlapping hits.

If a PatMatch search results in no or few matches, the user may try to increase the number of matches in a number of ways. Going back to the PatMatch search page, the user can change the database searched (for example, from genoSc to GenBank), use a less selective pattern, or increase the number of allowed mismatches, deletions or insertions.

Aborting a PatMatch Search
To abort a search, the user should click on the button labeled "Click here to abort the search", which will actually stop the process running on the SGD server. This is better than hitting the "Back" button on the browser, since otherwise the SGD computer will continue to process the search request.

Accessing the PatMatch Search Page

PatMatch can be accessed:
  1. by selecting the "PatMatch" hypertext link on the tool bar at the top of most SGD WWW pages.
  2. by selecting the "Pattern Matching" link in the Analysis & Tools contents page Page

Other Relevant Links

  1. Links within SGD
    1. BLAST Search Page
    2. FASTA Search Page
  2. External links
    1. GenBank

Associated Glossary Terms:

Go to PatMatch


Return to Saccharomyces Genome Database Send a Message to the SGD Curators