Saccharomyces Genome Database

SGD Help: Querying SAGE Data


Contents



Description

The SAGE technique (Serial Analysis of Gene Expression) has been used to analyze the expression profile of thousands of genes across the yeast genome, i.e. the yeast "transcriptome" (Velculescu, et al., (1997) Cell 88:243-251). A SAGE tag is a 14-nucleotide sequence that has been found within a mRNA. The relative abundance of a particular SAGE tag within a pool of tags gives some indication of the level of expression of the gene(s) containing that tag. Please Note: In order to interpret the expression data, it is essential to be familiar with the SAGE technique. For instance, it is important to realize that if there are two or more SAGE tags within a given ORF then only the data from the 3' most tag are a reflection of that ORF's expression. In addition, expression data obtained from SAGE tags that are not unique may reflect expression from more than one location. Please see Velculescu et al. (1995) "Serial analysis of gene expression," Science 270, 484-487, for additional information about the SAGE technique.

Each SAGE tag is put into one of four "classes" based on its location relative to known ORFs and is assigned a color in graphic displays:
1 - within an ORF (orange);
2 - within 500 bp 3' of an ORF (violet);
3 - on the strand opposite an ORF (yellow);
4 - none of the above (bright pink).

SGD provides a both Simple and an Advanced Query to access the SAGE data. The Simple Query allows you to search the SAGE data by Gene or ORF name or by chromosomal region, while the Advanced Query allows you to search Gene or ORF name, tag sequence, or relative expression values.

Using the Simple SAGE Query

You can search by gene or ORF name, or browse a chromosomal region with a Simple SAGE Query.
  1. Enter a gene or ORF name:

    With this search option, you can enter a gene or ORF name. After you have entered the name, hit the "Return" key or click on "query by gene." You can also enter part of a name and use the wildcard character (*). If a wildcard search matches more than one gene or ORF, a list of possible hits will be presented from which you can then select a single gene or ORF.

    The output of this search option is a Chromosomal SAGE Map showing the SAGE tags near the chosen gene or ORF. In the display, the requested gene or ORF is highlighted in red text. Tags are indicated with colored triangles. Tags that are unique in the genome are boxed.


    1. Class 1, 2, and 3 tags: Clicking on a class 1, 2, or 3 tag links to general information about the tag sequence (see image directly below), such as its coordinates, matching ORF (if any), location relative to that ORF, and whether it occurs one or more times in the genome (for Class 4 ORFs, additional information is provided, as explained further below).


    2. Class 4 tags: In the case of class 4 tags, additional information is provided. Class 4 tags are neither within an ORF, opposite an ORF, nor within 500 nt of the 3' end of an ORF. Therefore, their presence in the SAGE data may provide evidence for the existence of ORFs not yet annotated within the SGD ORF dataset, so-called non-annotated ORFs, or NORFs.

      Note: There is a link at the top of the class 4 tag potential ORF page which goes to the SAGE tag information for that tag

      The additional information about class 4 tags is three-fold:

      1. First, a graphical display is presented that shows potential non-annotated ORFs (NORFs) in the vicinity of the class 4 tag. The putative ORFs are color-coded, red if the SAGE tag would be within the ORF, green if the SAGE tag would be 3' of the ORF:

      2. Second, there is a table that provides the chromosomal coordinates, codon adaptation index, and length of each potential NORF. If a BLAST comparison of that NORF against the protein sequences at GenBank suggests a homologous protein may exist, then the P-value of the best match is given. Clicking on that P-value brings up the results of the BLAST comparison. Finally, the table provides links to some analysis tools:

  2. Examine a chromosomal region

    With this option, you can click on any section of one of the red bars representing the chromosomes to go to a Chromosomal SAGE Map described above (see Section I above: "Enter a gene or ORF name").


Using the Advanced SAGE Query

[SAGE Query Examples Table]

The Advanced SAGE Query differs in two primary ways from the Simple SAGE Query. First, all queries are entered in a relatively simple syntax language which is described on the SAGE Advanced Query page. This allows for more query options. Secondly, all query results are returned in tabular form of the type shown below. The Chromosomal SAGE map for a resulting ORF is linked off of this table (by clicking on the tag coordinate listed under the (COORD [Map Link] column). Information about the SAGE tag is found by clicking on the SAGE tag sequence in the table.

There are three general types of queries which can be made:

  1. Search for a Gene or ORF name alone or with another parameter:

    This query can be made by entering the phrase "GENE = X" or "ORF = X," where X is the gene or ORF name. A table is returned with all tag sequences associated with the entered gene or ORF name (see above for an example of the entry GENE=CDC15).

    It is also possible to search for tags affiliated with a gene or ORF name which have another query restriction. For instance, in order to search for all CDC28 associated tags which have an expression value for G2M which is greater than 2, one can use the following query: (GENE=CDC28) AND (G2M>2). Several more examples are listed in the SAGE Query Examples Table below.

  2. Query for relative expression values for the three different growth conditions studied, either alone or in combination with another parameter:

    The SAGE study compared gene expression under three different growth conditions:

    1. L: early log phase growth
    2. S: growth arrest at S phase (via hydroxyurea)
    3. G2/M: growth arrest at G2/M (via nocodazole)

    This search feature allows you to retrieve SAGE tags by their relative expression levels under the three growth conditions. For example, if you enter "S>L," you will retrieve all the SAGE tags that are expressed at a higher level in S phase arrest than log phase growth. As is the case for the gene and ORF names, another query restriction can be added. For instance, one could locate all unique tags where S > L using the following equation: (HITS=1) AND (S>L). Several more examples are listed in the SAGE Query Examples Table below. You also have the option to sort the results in descending or ascending order by the values for any of the growth conditions.

  3. Search for a particular SAGE tag sequence, alone or in combination with another query parameter:

    One can search for a particular SAGE tag sequence or set of tag sequences using the syntax language TAG = X, where X is a sequence of 14 nucleotides or less than 14 nucleotides (in this case, one of the characters would be a wild-card character). It is also possible to search for all SAGE tag sequences where the number of hits in the genome is the search criteria (e.g. HITS=1). One can also combine these two parameters (e.g. (TAG=CATGCAA*) AND (HITS=2)). In addition, it is possible to combine either a TAG or HITS query with a gene or ORF name or a relative expression value phrase. Several more examples are listed in the SAGE Query Examples Table below.

SAGE Query Examples Table

Note: All queries are case insensitive.
Search criteria Description of query
GENE = ACT1 Retrieve tags located within the gene ACT1
GENE = MYO* Retrieve tags located within any of the MYO genes
ORF = YNL301C Retrieve tags located within the ORF YNL301C
ORF = YNL*C Retrieve tags located on the Crick strand of the left arm of Chromosome XIV
TAG = CATGATTT* Retrieve tags that start with the sequence CATGATTT and are follwed by any nucleotide.
HITS > 5 Retrieve tags that have more than 5 sequence locations within the genome
(L>S)AND(S>G2M) Retrieve tags that are expressed at a higher level in log phase than in S phase AND are expressed at a higher level in S phase than in G2/M
(L>100)OR(S>100) Retrieve tags that are expressed at a level higher than 100 either in L phase OR S phase arrested
L BETWEEN 100 AND 200 Retrieve tags whose expression value falls between 100 and 200 in log phase
GENE IN (CLN1, CLN2, CLN3) Retrieve tags located in any of the genes CLN1, CLN2 or CLN3
L>10 Retrieve all tags where the value for L phase is greater than 10
(L!=0)AND(S!=0)AND(G2M!=0) Retrieve tags whose expression values in L phase, S phase, and G2M phase are not equal to zero
S != 0 AND L/S > 10 Retrieve tags whose expression values in L phase are 10-fold or more higher than those in S phase AND S does not equal zero


Accessing the Query SAGE Data Page

Other Relevant Links

  1. Links within SGD
    1. The paper by Velculescu, et al., describing the SAGE technique.

Associated Glossary Terms:

Go to Query SAGE Data
Go to SAGE Advanced Query

Last update 2005-11-11 ELH


Return to Saccharomyces Genome Database Send a Message to the SGD Curators