Background: Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap.
Results: We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism.
Conclusion: In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application.
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
| Evidence ID | Analyze ID | Gene/Complex | Systematic Name/Complex Accession | Qualifier | Gene Ontology Term ID | Gene Ontology Term | Aspect | Annotation Extension | Evidence | Method | Source | Assigned On | Reference |
|---|
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details.
| Evidence ID | Analyze ID | Gene | Gene Systematic Name | Phenotype | Experiment Type | Experiment Type Category | Mutant Information | Strain Background | Chemical | Details | Reference |
|---|
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
| Evidence ID | Analyze ID | Gene | Gene Systematic Name | Disease Ontology Term | Disease Ontology Term ID | Qualifier | Evidence | Method | Source | Assigned On | Reference |
|---|
Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; to filter the table by a specific experiment type, type a keyword into the Filter box (for example, “microarray”); download this table as a .txt file using the Download button or click Analyze to further view and analyze the list of target genes using GO Term Finder, GO Slim Mapper, or SPELL.
| Evidence ID | Analyze ID | Regulator | Regulator Systematic Name | Target | Target Systematic Name | Direction | Regulation of | Happens During | Regulator Type | Direction | Regulation Of | Happens During | Method | Evidence | Strain Background | Reference |
|---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through its pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
| Site | Modification | Modifier | Source | Reference |
|---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.
| Evidence ID | Analyze ID | Interactor | Interactor Systematic Name | Interactor | Interactor Systematic Name | Allele | Assay | Annotation | Action | Phenotype | SGA score | P-value | Source | Reference | Note |
|---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.
| Evidence ID | Analyze ID | Interactor | Interactor Systematic Name | Interactor | Interactor Systematic Name | Assay | Annotation | Action | Modification | Source | Reference | Note |
|---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through its pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
| Complement ID | Locus ID | Gene | Species | Gene ID | Strain background | Direction | Details | Source | Reference |
|---|
Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; download this table as a .txt file using the Download button;
| Evidence ID | Analyze ID | Dataset | Description | Keywords | Number of Conditions | Reference |
|---|
Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; download this table as a .txt file using the Download button;
| Evidence ID | Analyze ID | File | Description |
|---|