SGD Help: YeastMine
YeastMine (YM) offers a quick, powerful way to search the data in SGD, providing both basic searches and flexible ways to customize your search for more advanced queries. Even more powerful is the fact it allows you to save your results, download them in various formats, and even use them as the subject of further YeastMine searches. It also provides built-in tools for analysis of your results (view the videos 8 Cool Things you can do with YeastMine and YeastMine is Awesome for quick overviews). YeastMine is described in detail in Balakrishnan et al (2012).
For step-by-step help using various YeastMine features, view our list of YeastMine Video Tutorials.
- Basic Information
- What is YeastMine?
- What features make YeastMine a powerful and flexible search tool?
- How do I perform specific searches of SGD data using YeastMine?
- Does YeastMine support a Quick Search feature?
- Can I customize a pre-made query?
- Can I run my search on a custom list of genes??
- What options are are available for analyzing a list of genes?
- How do I build my own search from scratch?
- Can I save my search results??
- How do I export my search results and which formats are available?
- Can I combine the results of different searches to make a new list of genes for subsequent searches?
- What is MyMine?
- Example Searches (queries) and Uses
- Gene Ontology (GO) Data
- Phenotype Data
- Sequence and Genomics Data
- Interaction data
- Glossary of terms and concepts in YeastMine
YeastMine (YM) is a data search and retrieval tool that incorporates the many different types of data present in SGD and allows users to query across multiple data types in different combinations, facilitating analysis. Yeastmine provides custom search and download capabilities. Datatypes that can be queried in YeastMine include chromosomal features, sequences, protein features, GO annotations, phenotypes, interaction data, expression data, and curated literature.
YeastMine (YM) homepage has several options to query for data in SGD. A gene of interest can be searched for using the Search box or a list of genes or GO terms can be analyzed using the Analyze option. In addition, a small list of popular pre-defined search forms or templates are available on the YeastMine (YM) homepage for a quick retrieval of results. Pre-made lists of genes are also accessible from the home page. The tool bar, present on all YeastMine pages, also offers easy access to the core set of YeastMine (YM) operations. Use of Templates, Lists and other concepts are explained below using specific examples.
A subset of YeastMine features are highlighted below. Much more can be accomplished with this tool using a combination of these features.
- Quick retrieval of results by both basic and advanced searches.
- Searches most data types in SGD (examples: GO, Phenotype, Literature, Expression data...)
- Preset basic search forms.
- Search results can be saved for future use.
- Saved search results can be combined, intersected, or subtracted with other saved results of the same data type.
- Saved search results can be downloaded in a variety of formats.
- New searches can be performed on the saved results of a previous search.
- Basic searches can be customized for more flexible, advanced searches.
- Advanced searches can be created from scratch, allowing flexibility and powerful searches across different data types.
- Provides the ability to quickly analyze a list of search results (GO enrichment widget).
- Offers a personal account, MyMine, which allows users to save their results and custom searches so they can be used again.
YeastMine offers a variety of preset, basic search forms, referred to as Templates. DIfferent forms query different data types, including GO, Sequence, Phenotype, Homolog, and Interaction data. They can be accessed from the Templates option on the navigation bar. Information on how to use these preset search forms is available below and can be viewed on the Templates Basics video.
A quick search box is available on the top right corner of all YM pages which can be used to search the YM database for many concepts such as a key word, a gene name, an author name, ontology term or a pathway name. Search results or hits are grouped in categories.
Pre-made search forms (Templates) can be modified to constrain the query to more specific fields or to include more data in the output. Select the "Edit Query" button on the Template. This opens up a feature (QueryBuilder) that allows the user to browse the different types of data (using the Model browser) and either add a constraint to the search or include or exclude additional selected fields in the results. For step by step instructions on editing an existing search form (Template), view the tutorials Editing Templates Part 1 and Editing Templates Part 2.
Yes, any search (preset or customized Template) can be run on a selected gene List. YeastMine has preset gene Lists (e.g. Verified_ORFs, Uncharacterized_ORFs, etc.), but you can also create your own customized List. To create a list of genes, click on the Llsts tab on the top menu bar, making sure you are on the Upload tab. Create a List, either by typing (or pasting) in a list of gene names, or by uploading a .txt file. Give the List a name and save it (for step-by-step instructions, please view the video tutorial Creating/Using Lists). When running a search, most Templates offer the option to search for either a single gene or to search a gene List. If you have saved Lists, the pull-down will show each of them, along with the preset Lists, allowing you to select one to search. For more information, please view the tutorial Using Lists in a Template. Please note that a newly created LIst will be saved within a YeastMine session, but not between YeastMine sessions. To save your List so you can use it at a different time, create a MyMine account, as described in the Personalized YeastMine section. Lists of data types other than genes can also be created using the same procedure.
Once a List has been created and saved, clicking on the name of the List automatically displays an analysis of the genes in the List. There are four separate analyses, each performed by a different widget. These include GO enrichment (to identify significant shared GO terms for the list of genes), Publication Enrichment, Pathway Enrichment, and Interactions (providing a list of genes that interact with genes on the List). The GO enrichment widget is described in detail in the video tutorial Widgets: GO Term Enrichment.
If none of the preset search forms match your intended search, a new search form (Query) can be built from scratch using QueryBuilder. A link to QueryBuilder exists on the YeastMine main menu bar, making it easily accessible. The first step in building a new Query involves selecting the type of data you would like to retrieve. Most searches are involve looking for genes that match specific criteria, in which case "Gene" should be picked as the starting data type (referred to as class name). Once the primary data type has been selected, the subsequent page shows the QueryBuilder tool, with the Model browser on the left and an overview of the new Query, changing as it is built, on the right panel. The Model browser displays the data available in YM in an easy to navigate form. Use the Summary, Show and Constrain buttons within the Model browser to build your Query on the right. The Summary button will select all the primary data associated with the Gene, listed as fields under the class name Gene (e.g. SGDID, Systematic name, etc.), while the Show button allows you to add individual fields for Gene (e.g. SGDID) to the Query. Depending on what you select, you will see it appear in the Query Overview. The Constrain button lets you specify any field you would like to constrain. For instance, to search for genes that have a specific phenotype observable, click on the Constrain button next to "Observable" under the class name Phenotype (after you have selected Summary next to the Gene class). For step by step instructions on how to use QueryBuilder, please view the video tutorial Editing Templates Part 2.
Yes, you can save specific rows or all rows of your search results as a List. Please note that the results of a search may appear on one or more pages, depending on the number of data rows retrieved. To save your results as a List, select all results using the check box next to the class name (e.g. Gene) in the blue menu bar, or individual results (rows) by checking their specific boxes, and click "Create List" in the gray menu bar. You can then name your List of search results and Save. For step by step instructions, please view the video tutorial Saving Search Results as a List. Please note that a newly created List will be saved within a YeastMine session, but not between YeastMine sessions. To save your List so you can use it at a different time, create a MyMine account, as described below.
Search results can be exported using the Export Button located in the gray menu bar on the results page. Results may be exported 1) as comma or tab separated values suitable for import into Excel, 2) to Galaxy, or 3) in FASTA or GFF3 formats.
Can I combine the results of different searches to make a new list of genes for subsequent searches?
YeastMine not only allows you to create and save Lists of genes, but to form a new Lists by combining Lists, finding genes in common between Lists, or subtracting one List from another. The new List can then be used in subsequent searches. View the tutorial List Operations for instructions on these various operations. In addition to performing operations between Lists, YeastMine allows you to add selected items from either a search results or an existing List to another existing List. View the tutorial Adding Objects to a Saved List for detailed instructions.
YeastMine will save Lists within a session, but the only way to save them between sessions is to create a personalized MyMine account. Creating a MyMine account requires only and email address and password. If you are logged in to your account, anytime you create and save a List with a name, or run a Query, it will be permanently saved. In addition, if you want to save a new, custom search form (Template) for reuse at another time (for instance, on another set of input genes or a different List), you must have a MyMine account. For more information on saving Lists, Queries and custom Templates in My Mine, please view the video tutorials MyMine 1: Save Lists/Queries and MyMine 2: Create/Save Templates.
To accomplish this task you need to use two operations in YeastMine:
You can upload a list of genes using the Lists option available in the tool bar or using the Analyze option on the home page of YeastMine or create a new list by typing or pasting in a list of genes. The Lists option lets you save the genes in a file (with a name of your choice).
To get to the pre-defined query form, go to the homepage of YM, click on the GeneOntology tab. Pick Gene --> GO Terms option and you will land on the desired preset search form or template. Since the goal is to retrieve annotations for a list of genes, check the box next to 'Constrain to be IN saved Gene List' and you will be able to select the file that you saved earlier. Clicking 'Show results' will retrieve the GO annotations for all the genes in your list.
Uncharacterized genes (i.e. genes for which a function hasn't been identified) can be identified using the Gene Ontology annotation to the root node: Molecular_Function (GO:0003674) unknown. Use the 'GO ID --> Genes' or 'GO Term --> All genes' templates to query using GOID (GO:0003674) or the GO term 'molecular_function' respectively to retrieve genes for which a function is not known.
Relevant video tutorial: Template Basics
To retrieve a list of proteins localized in the mitochondria, you can again use the Gene Ontology Templates 'GO ID --> Genes' or 'GO Term --> All genes'. Enter "GO:0005739' or "mitochondrion" as the GO ID or the GO term name and this should retrieve all the gene products annotated to mitochondrion in SGD.
Relevant video tutorial: Template Basics
This query requires three steps.
- Create your desired List of 'function unknown' as described above. From the Results page, select the 'Gene>Primary SGDID' column to create and save a list of 'function unknown' genes.
- Select the preset search form (template) that finds homologs for a list of genes. Go to the Templates page, and select the preset search form (template) Gene --> Homologs.
- Edit the search to constrain the output to human homologs. On the search form, click on the checkbox next to "constrain to be" which will allow you to select the gene list you created in step1. By default, this search retrieves the homologs for all available organisms. To find only human homologs, constrain the query by clicking on "Edit Query," which will take you to QueryBuilder. The left panel displays the Model browser that allows you to edit the preset search; the right panel shows the preset search and shows changes to the search made using the Model browser. Within the Model browser, click on the + sign next to Homologs to open it , then scroll down and click on "Organism." Click on the Constrain box next to the "Common Name" to display a small window where you can select 'human' from the pull down and click "Add to query." Finally click "Show results" to retrieve the human homologs for your list of genes annotated to 'function unknown.'
This is a fairly basic phenotype search, and there is therefore a preset search form (Template) that allows you to find all genes that exhibit a specific phenotype. Find this preset search form by clicking on Phenotype in the gray menu bar on the YeastMine home page. The name of the preset search form 'Phenotype --> Genes' matches the desired search of starting with a phenotype and searching for genes annotated to it. Click on this preset Template and select 'Viable' from the pull-down menu. Clicking "Show Results" will retrieve all genes with the phenotype observable "Viable."
Relevant video tutorial: Template Basics
This search is similar to the one above, except for the fact that there are 2 types of temperature sensitive phenotypes, so there are 2 observables to select - 'Cold sensitivity' and 'Heat sensitivity,' requiring the selection of both simultaneously. Once again, select the preset search form 'Phenotype --> Genes.' From the pull-down, select 'Cold sensitivity,' then hold the Command Key down for the Macintosh and click on 'Heat sensitivity' to select both observables to be used simultaneously in the search. Once again, click "Show Results" to retrieve the list of all genes exhibiting temperature sensitive phenotypes.
Relevant video tutorial: Template Basics
First, create a list of your genes. You can either upload a list of genes using the Lists option available in the tool bar (click on Lists on the purple menu bar located at the top of most YeastMine pages and make sure you are on the Upload sub-tab) and name the new List, or use the Analyze option on the home page of YeastMine to create a list of genes. For the latter, type or paste in Standard names (e.g. ACT1, ASK1, SGD1), Systematic names (e.g. YBR126C), or SGDIDs (e.g. S000000330) in the Analyze box and then click the "Analyze" button to create your list by giving it a name. Once you have your list, you can use it on any search form. For this search, select the 'Gene --> Chromosomal location' search from (Template) found on the YeastMine Home page (also found on the Templates page), then click on the check box next to "constrain to be." This allows you to pick a gene list from the pull down. Select the name of the gene list you created to retrieve the coordinates for the genes in your list.
For this operation, you can use the Regions feature in YeastMine. To get to this resource, select "Regions" on the purple navigation bar found at the top of most YeastMine pages. Once on this form, select the feature type(s) you would like to retrieve (ORF, tRNA etc) in Step 2. In Step 3, you can either paste the coordinates of one or more chromosomal regions (e.g. chrIII:1356..20455) or upload a file with the list of coordinates. You can retrieve features from multiple chromosomes at a time. In addition, you can also extend the coordinate regions on either side using the sliding bar in Step 4.
This can be accomplished in one step using the Lists feature in YeastMine. Copy and paste the systematic names into the create list dialog box. The resulting list automatically retrieves all known names for the genes in your input list, including standard names and aliases. You can then export this list and add additional data to the exported list such as Description etc as well.
Relevant video tutorial: Creating/Using Lists
This is a basic search, and thus a preset search form (Template) named 'All genes of a selected Feature Type-->Genes with introns' is available to retrieve this specific list. The search form offers an option to select a specific feature type that have introns.
A pre-defined search form called All genes in organism --> All overlapping genes is available for this task. The results page show display details on the gene and the overlapping gene. This template will retrieve overlapping features for Dubious, Uncharacterized, Verified ORFs, pseudogenes, transposable element genes, RNA genes.
Relevant video tutorial: Template Basics
Searching for interacting genes or proteins is simple using the preset search form (Template) named 'Gene --> Interaction.' However, since there are different types of interaction data in SGD and genomic length information is desired, this basic query needs to be edited to 1) limit the search to physical interactions and 2) retrieve the genomic lengths for both of the interacting genes. This can be accomplished using the 'Edit Query' option available using the following steps:
- Click "Edit Query" on the the 'Gene --> Interaction' search form to edit this form in QueryBuilder.
- Specify that the results will display the genomic lengths for the input genes. On the Model browser panel of QueryBuilder, find "Length" under the class named "Gene" and click "Show" directly to its right. You will see "Length" added under "Gene" on the right side of the page, or the "Query Overview."
- Constrain the search to identify gene pairs that whose products exhibit physical interactions. Back in the Model browser, find the class "Interactions" and click on the [+] sign to see the individual Interactions fields. Since the search is for a specific type of Interaction, physical interactions, locate the "Interaction Type" field and select "Constrain." A box will pop up that allows you to constrain your search to "physical interactions" by selecting this field from the pull-down. Once "physical interactions" is selected, click "Add to query" to add this constraint to your search. Look on the Query Overview panel and you will see that Interaction Type now has "= physical interactions" under it.
- Specify the results will display the length of the gene that interacts with the input gene. Back in the Model browser window, scroll down further to 'Interacting Genes' under "Interactions" and click on the [+] symbol to see the fields available for "Interacting Genes." Click "Show" next to "Length" under this section - on the Query Overview panel, you should now see "Length" displayed under Interacting Genes, which is located under "Interactions." The query builder page also allows to customize the columns that are displayed in the Results table.
- View your results. Now that your customized search is specified, you can run the search. If desired, you can first look at the "Fields selected for output" feature below the Model browser and Query Overview and add, delete or rearrange fields you would like to display in your results table. For instance, perhaps you don't need to see the "Systematic Name" for the interacting gene in your results. In this case you can delete (by clicking the red "x") the box that specifies this ("Gene > Interactions > Interacting Genes > Systematic Name"). Note you could also do this by clicking the red "x" next to the "Systematic Name" under "Interacting Genes" within the Query Overview, but you can't rearrange the order of the output in the Query Overview. The "Fields selected for output" feature also allows you to choose the column on which to sort the output, including ascending or descending order. When you are satisfied with the fields that will display in the results table and their order, click "Show results" to view the results of your search. Note that each row of the results displays an input gene from your list and its interacting gene, as well as the genomic length of each gene in the interacting pair.
This query involves using the "Gene --> Interaction" search form to create lists of genes that interact with each individual gene on a specified list, and then exporting this List in a format that allows the list to be manipulated in order to visualize the shared interacting partners.
1. Create a list of genes for which you would like to find interacting proteins. You can either upload a list of genes using the Lists option available in the tool bar (click on "Lists" on the purple navigation bar and make sure you are on the Upload sub-tab) and name the new List or use the Analyze option on the home page of YeastMine to create a list of genes. For the latter, type or paste in Standard names (e.g. ACT1, ASK1, SGD1), Systematic names (e.g. YBR126C), or SGDIDs (e.g. S000000330) in the Analyze box and then click the "Analyze" button to create your list by giving it a name.
2. Select the 'Gene --> Interaction' search form. This is a basic search that searches for all the interactions for a given gene or list of genes, and is available as a preset search form (Template) in YeastMine.
3. Constrain the search to your saved gene list. Since you want to find interactions for those genes on your saved genes list, click the checkbox "constrain to be," then select [IN] from the first pull-down and the name of your list from the subsequent pull-down (after "saved Gene list"). Click "Show Results".
4. Export your results in the desired format. Once on the results page, click "Export" from the menu immediately above the results table. Choose a desired format (for instance, comma separated for Excel).
5. Manage your file. Open your exported file in an application such as an Excel and process using pivot-table or some other method to visualize the shared list of interacting genes.
Chromosomal feature include both non-coding DNA feature types and feature types that code for a protein or RNA gene product. Chromosomal features retrieved by YM include a Uncharacterized and Verified ORFs, pseudogenes, transposable elements and transposable element genes, RNAs, telomeric regions, centromeres, ARS, genes Not in Systematic Sequence of S228C, and genes Not Physically Mapped. Not all chromosomal features have been mapped on to the reference genome sequence of S. cerevisiae
Feature types are categories of chromosomal features described above. Feature types retrieved by YM include Uncharacterized, Dubious and Verified ORFs, pseudogenes, transposable elements and transposable element genes, RNAs, telomeric regions, centromeres, ARS. All the features listed above have been mapped to the reference genome of S. cerevisiae and have coordinates.
A simple search interface for a predefined query is referred to as a template. YeastMine provides a variety of templates that are grouped by data type (such as Genomics, GO, phenotype). Templates can be accessed from the tool bar present in all YeastMine pages. Each Template is shown with a short description of the search performed. Templates can be constrained to a default value or to a list of related data Objects. A majority of the templates are gene-centric, i.e. they allow for the retrieval of a particular data type for a gene Object or a list of gene Objects. The default gene Object in YeastMine includes all the feature types that are present in the gene association file (GAF) (Uncharacterized and Verified ORFs, pseudogenes, transposable element genes, RNAs and genes ‘Not in Systematic Sequence of S228C’). An example of a template search that retrieves a list of genes is the ‘Chromosome→Genes’ template. Using this template, the user selects the desired chromosome from a pull-down menu and the search retrieves all gene Objects from the chromosome of choice.
Another functionality that YeastMine provides is the ability to upload, query, retrieve, download and manipulate lists of different data types. Lists can be made for any Object entity such as a list of genes or GO Term identifiers. They can be predefined by SGD, user-generated via uploading, or saved from the results of a query. The predefined lists include gene sets such as Verified ORFs, Uncharacterized ORFs, and tRNAs and are available from the ‘View’ submenu of the YeastMine Lists tab. Custom lists can be created through the ‘Upload’ submenu of the YeastMine Lists tab. Potential inputs for a custom list could be the result of a query at SGD or a list of genes identified in a genetic screen. Results from executed queries can be selected and added to a list via the ‘Create List’ option at the top of all search results.
Once a list is created, it can be used for additional queries or comparison with other lists. Lists can be used to restrict template queries to search for results relevant to that list. Templates where this option is available will have a ‘constrain in’ check box option that is followed by a pull-down menu populated by the SGD premade lists and any lists created by the user within their search session. Lists can be manipulated to perform functions such as joining lists, finding the intersection between lists, or subtracting lists to find features unique for some desired characteristics (shown in the Editing Templates video tutorial). In addition several widgets are available to analyze the lists further. The GO enrichment widget, for example, determines statistically significant enrichment of GO terms for a list of genes (shown in the GO Term enrichment video tutorial).
In addition to searching YeastMine with pre-set queries, it is possible to modify any existing query (template), or even to build one from scratch using the Query Builder function. In the Query Builder tab, the Model Browser displays the data present in YeastMine in an easy to navigate form and can be used to select and build a new or edit a predefined query (insert link to video). A new query can be built starting with any YeastMine data object such as Gene or GO annotation or Phenotype. The default Gene object in YeastMine mirrors the classifications of genes defined by SGD such as Verified ORFs, Uncharacterized ORFs, ‘Not in Systematic Sequence of S228C’. Similarly, the default Phenotype object mirrors all the attributes that are curated and displayed in the main SGD database. Users can modify any template using ‘Edit Query’ to customize data retrieval and display. A predefined query can also be edited using the Model Browser to include or exclude data or data attributes. Query Builder allows query customization by the ability to constrain on any Object, and choice of various data output options. This enables the user to build a custom query that suits their specific data search and retrieval needs. For example, if one has a list of genes that have correlated gene expression and would like to download the GO Biological Process annotations for those genes, it is fairly straightforward to modify an existing template to get these data. After saving the genes from a microarray cluster as a list using the List feature, one can go to the ‘Gene-->GO terms’ template, restrict the query to use the saved list and then by editing the query using the Model Browser, add constraints to the Ontology Name Space to retrieve just the Biological Process annotations.
The Model Browser displays the data present in YeastMine in an easy to navigate form and can be used to select and build a new or edit a predefined query. A new query can be built starting with any YeastMine data object such as Gene or GO annotation or Phenotype. The Model Browser can be accessed by clicking on the Query Builder tab and it appears on the right side of the page.
The Regions option allows to search for certain feature types (ORFs, tRNA etc) within a specified chromosomal region of the genome. Chromosomal coordinates can by typed/pasted in the text box, or can be uploaded from a file. In addition, features in a flanking region can also be retrieved using this option.
All queries and lists can be saved for use in future YeastMine sessions by creating a personalized ‘MyMine’ account. MyMine creates a private workspace for the user to create and save queries, templates and lists
The data columns in all of the result reports are customizable, enabling the user to choose exactly what type of information is in the output of a search. This feature is available both from the record results page and through the Query Builder. It is also possible to export all results either as a list for further querying within YeastMine, as a table to the Galaxy tool, or as a file to your desktop. YeastMine supports data download in multiple formats (tab delimited, comma separated, excel) and GFF3 format for sequence related data.
The SGD transcriptome dataset is an integrated compilation of data from 11 transcriptomic publications:
- Pelechano 2013
- Nagalakshmi 2008
- Neil 2010
- Zhang 2005
- VanDijk 2011
- Ozsolak 2010
- Miura 2006
- Yassour 2009
- Yassour 2010
- Lardenois 2011
- Xu 2009
Each transcript represents a full length isoform from the Pelechano et al (2013) dataset that covers a region corresponding to a single ORF and both of its ends have been detected by at least one other independent experiment (publication and labs). The other 10 datasets were used as supporting evidence for the transcript UTR feature. We have provided a count of how many times each unique UTR has been detected, as well as the corresponding pubmed identities for those publications from which the data were extracted. Additional metadata has been included which identifies the growth conditions under which the transcript isoform was detected (initially only glucose and galactose).
The method by which we have integrated multiple data sources is achieved by mapping reported 5’ and 3’ UTR features to full length mRNA transcripts that were sequenced (paired end TIF-seq). Because the individual datasets have been produced using several different technologies, methods and platforms, there is some variability in the precise coordinates of feature start and end locations. According to Pelechano et al (2013), there is a very consistent spacing of 8 nucleotides between unique transcript start/end sites. To optimize our mapping of UTR features to full length transcripts, we allowed a 7 nucleotide window (3 bp either side of a reported feature edge), which provides good correlation of features across the datasets.
As more transcriptome data is published and made publicly available, this integrated dataset can be further expanded as more evidence for less abundant transcript isoforms become available. It is also our objective to include more data from experiments that have investigated different growth conditions and treatments, which will lead to a broader knowledge of the Saccharomyces global transcriptome.
- Glucose count: Number of times this transcript isoform was detected in glucose.
- Galactose count: Number of times this transcript isoform was detected in glucose.
- Note Covering one intact ORF: The full length of this transcript was sequenced from both ends and maps to a single ORF in the S. cerevisiae genome.
- Five Prime Score: Number of experiments in which the 5 prime UTR of this transcript has been reported.
- Three Prime Score: Number of experiments in which the 3 prime UTR of this transcript has been reported.
- Five Prime Data Set: The publication from which there is supporting five prime UTR evidence for this transcript isoform.
- Three Prime Data Set: The publication from which there is supporting three prime UTR evidence for this transcript isoform.