New & Noteworthy

Where’s That Protein?

July 1, 2015

Waldo will always be hard to find, but we now know exactly where to find more than 4,000 S. cerevisiae proteins, thanks to new methods and an analysis pipeline. Image by William Murphy via Wikimedia Commons

You might be familiar with the Where’s Waldo book series, especially (but not necessarily) if you have kids. They challenge the reader to find Waldo within huge, intricately drawn groups of people. Even though Waldo has his distinctive characteristics—glasses and a striped shirt and hat—he can be very hard to find.

Now imagine that the drawings shift under different conditions, so that Waldo could be in any of several places at different times. And imagine that you’re not just looking for Waldo, but also for thousands of other unique individuals—all tagged in the same way. This is the challenge faced by researchers who want to know where each protein in a cell is located and how its location and abundance respond to different environments.

But, as genetic, robotic, microscopic, and computational tools get more and more sophisticated, it’s becoming possible to pinpoint Waldo and his companions even as they move around within the jam-packed yeast cell.

In two new papers, scientists from the University of Toronto describe a huge effort that entailed over 9 billion quantitative measurements to find the location and measure the abundance of more than 4,000 S. cerevisiae proteins. Chong and colleagues wrote in Cell about the approach and experimental methods, while Koh and colleagues published in G3 about the computational methods and the database that houses all the data, called CYCLoPs for Collection of Yeast Cells and Localization Patterns.

This work couldn’t have been done without a valuable resource that was created some years ago: the yeast GFP collection. It’s a set of strains, each with the green fluorescent protein gene fused to the 3’ end of one open reading frame to express a GFP fusion protein from the ORF’s native promoter. Not every yeast protein can be detected this way: some are expressed too weakly, while others may actually be destabilized by their GFP tags. Still, more than 4,100 of these fusion genes—71% of the proteome—give a visible GFP signal in the cell.

The researchers started with these ~4,100 strains and transformed each with a plasmid expressing red fluorescent protein. This allowed them to visualize the boundaries of each cell. Then they got to work, taking pictures of at least 200 cells of each strain and developing an automated pipeline to analyze them. They ended up analyzing 300,000 micrographs of more than 20 million cells, beating the few dozen Where’s Waldo books by a long shot!

The scientists looked at each protein in wild type, in a mutant strain, and in the presence of two drugs. The mutant strain they studied was deleted for RPD3, which encodes a lysine deacetylase that regulates the stability and interactions of histones and other proteins. The drug treatments were done with several different concentrations of rapamycin (an inhibitor of the TORC1 complex, which is an important regulator of cell growth) or hydroxyurea (a DNA replication inhibitor).

The end result was an enormous collection of data, now stored in the CYCLoPs database, that shows the abundance of each protein in each of 16 cellular compartments under all of these different conditions. These data are much more quantitative and consistent than any protein abundance or localization data that had been obtained before. They are stored in such a way that measurements within single cells can be accessed, and the database can be searched by patterns of changes in localization or abundance as well as for data on a particular protein.

The authors came up with some innovative methods for visualizing this immense dataset to get a high-level overview. One of their most surprising findings was just how many proteins localize to multiple places. We tend to think of the cell as a tidy place where each protein has one particular location, but Chong and colleagues found that it’s extremely common for proteins to be in several spots.

Most often, when proteins are present in more than one place, those places are the nucleus and the cytoplasm. Some proteins had already been shown in small-scale studies to be present in both compartments, or to shuttle between them. But the authors saw an astounding 1,029 proteins localizing to both the nucleus and cytoplasm under standard conditions in wild-type cells.

Not counting the proteins in the nucleus and cytoplasm, another 511 proteins localized to more than one place. Some were seen in up to five different subcellular compartments.

The proteins with multiple locations, as a group, were more likely than the average protein to be phosphorylated. This made sense, because phosphorylation of proteins is known to regulate their localization. And many of these proteins themselves had regulatory roles, controlling processes such as cell division.

The fact that data were collected from single cells means that we can use them to uncover the dynamics of protein movement. For example, if a protein was scored as localizing to both the nucleus and the cytoplasm, does that mean there’s a pool of it in both places at all times, or does it move back and forth? The single-cell data for two representative proteins, Mcm2 and Whi5, showed clearly that any one cell has each of these proteins in either the nucleus or cytoplasm, but not both. But some other proteins hang out in both places at once. And the dynamics of still more roving proteins are just waiting to be revealed.

Researchers will be mining the CYCLoPs resource to find detailed information about specific proteins, pathways, and processes for years to come. The data gathered in the rpd3 mutant and under rapamycin and hydroxyurea treatment served as proof of principle that the system can be used to assess the effects of a variety of mutations and drugs.

So this study puts a spotlight on Waldo in each picture and makes it simple to find him and his friends. This mass of data on where proteins are and how they move around has far-reaching implications for yeast systems biology, and the methodology can now be applied to cells of other organisms as well. In the coming weeks, we’ll make it even simpler for you to access these data from SGD, by adding links for individual proteins to the CYCLoPs database.

by Maria Costanzo, Ph.D., Senior Biocuration Scientist, SGD

New SGD Help Video: Yeast-Human Functional Complementation Data

June 30, 2015

Yeast and humans diverged about a billion years ago, but there’s still enough functional conservation between some pairs of yeast and human genes that they can be substituted for each other. How cool is that?! Which genes are they? What do they do?

This two-minute video explains how to find, search, and download the yeast-human functional complementation data in SGD. You can find help with many other aspects of SGD in the tutorial videos on our YouTube channel. And as always, please be sure to contact us with any questions or suggestions.

The Sounds of Silencing

June 17, 2015

For centuries, we thought of the universe as an empty, eerily silent place. Turns out we were dead on when it came to the emptiness, not so much when it came to the silence.

Despite more and more powerful equipment, SETI has yet to find any meaningful radio signals coming from the stars. Yeast research is in a better position: new techniques applied to telomeric gene expression now make sense of the signals. Image by European Southern University (ESO) via Wikimedia Commons

Once we invented devices that could detect electromagnetic radiation—starting with the Tesla coil receiver in the 1890s—we began to realize what a noisy place the universe really is. And now with modern radio telescopes becoming more and more sensitive, we know there is a cacophony of signals out there (although the Search for Extraterrestrial Intelligence has yet to find any non-random patterns).

The ends of chromosomes, telomeres, have also long thought to be largely silent in terms of gene expression. But a new paper in GENETICS by Ellahi and colleagues challenges that idea. 

Much like surveying the universe with a high-powered radio telescope, the researchers used modern techniques to make a comprehensive survey of the telomeric landscape–and saw that the genes were not so silent. Their work revealed that there’s a lot more gene expression going on at telomeres than we thought before.

It also gave us some fascinating insights into the role of the Sir proteins, founding members of the conserved sirtuin family that is implicated in aging and cancer.

Telomeres are special structures that “cap” the ends of linear chromosomes to protect the genes near the ends from being lost during DNA replication, something like aglets, those plastic tips that keep the ends of your shoelaces from fraying. They have characteristic DNA sequence elements that we don’t have space to describe here (but you can find a short summary in SGD).

Classical genetics experiments in Drosophila fruit flies showed that telomeres had a silencing effect on the genes near them, and early work in yeast seemed to confirm this. Reporter genes became transcriptionally silenced when they were placed near artificial constructs that mimicked telomere sequences.

This early work was solid, but had a few limitations.  The artificial telomere constructs were, well, artificial; some of the reporter genes encoded enzymes that had an effect on overall cellular metabolism, such as Ura3; and the studies tended to look at just one or a few telomeres.

To get the whole story, Ellahi and colleagues decided to look very carefully at the telomeric universe of S. cerevisiae. First, they used ChIP-seq to look at the physical locations of three proteins, Sir2, Sir3, and Sir4, on chromosomes near the telomeres.

These proteins, first characterized and named Silent Information Regulators for their role in silencing yeast’s mating type cassettes, had been seen to also mediate telomeric silencing. Scientists had hypothesized that they might be present at telomeres in a gradient, strongly repressing genes close to the chromosomal ends and petering out with increasing distance from the telomere. 

Ellahi and coworkers re-analyzed recent ChIP-seq data from their group to find where the Sir proteins were binding within the first and last 20 kb regions of every chromosome. These 20 kb regions included the telomere and the so-called subtelomeric region where genes are thought to be silenced. They found all three Sir proteins at all 32 natural telomeres.

However, the Sir proteins were not uniformly distributed across the telomeres, but rather occupied distinct positions. Typically, all three were in the same position, as would be expected since they form a complex. And they were definitely not in a gradient along the telomere.

Next the researchers asked whether gene expression was truly silenced in that subtelomeric region. They used mRNA-seq to measure gene expression from the ends of chromosomes in wild type or sir2, sir3, or sir4 null mutants.

They found that contrary to expectations, there is actually a lot of transcription going on near telomeres, even in the closest 5 kb region. The levels are lower than in other parts of the genome, but that can be partly explained by the fact that open reading frames are less dense in these regions. And only 6% of genes are silenced in a Sir-dependent manner.

The sensitivity of mRNA-seq allowed Ellahi and colleagues to uncover new patterns of gene expression in this work. They were able to detect very low-level transcription from some of the telomeric repetitive elements. Also, because the SIR genes are involved in mating type regulation, the mRNA-seq data from the sir mutants revealed a whole new set of genes that are differentially expressed in different cell types (haploids of mating types a and α, or a/α diploids).

The researchers point out that their work raises the question of why the cell would use the Sir proteins to repress transcription of a few subtelomeric genes. Wouldn’t it be more straightforward if these genes just had weaker promoters to keep their expression low?

They hypothesize that Sir repression could actually be part of a stress response mechanism, allowing a few important genes to be turned on strongly when needed. This idea could have intriguing implications for the role of Sir family proteins in aging and cancer in larger organisms. 

So, neither the universe nor the ends of our chromosomes are as silent as we thought. But unlike the disappointed SETI researchers, biologists studying everything from yeast to humans can now build on this large quantity of meaningful data from S. cerevisiae telomeres. 

by Maria Costanzo, Ph.D., Senior Biocuration Scientist, SGD

Yeast-Human Functional Complementation Data Now in SGD

June 10, 2015

Yeast and humans diverged about a billion years ago. So if there’s still enough functional conservation between a pair of similar yeast and human genes that they can be substituted for each other, we know they must be critically important for life. An added bonus is that if a human protein works in yeast, all of the awesome power of yeast genetics and molecular biology can be used to study it.

To make it easier for researchers to identify these “swappable” yeast and human genes, we’ve started collecting functional complementation data in SGD. The data are all curated from the published literature, via two sources. One set of papers was curated at SGD, including the recent systematic study of functional complementation by Kachroo and colleagues.  Another set was curated by Princeton Protein Orthology Database (P-POD) staff and is incorporated into SGD with their generous permission.

As a starting point, we’ve collected a relatively simple set of data: the yeast and human genes involved in a functional complementation relationship, with their respective identifiers; the direction of complementation (human gene complements yeast mutation, or vice versa); the source of curation (SGD or P-POD); the PubMed ID of the reference; and an optional free-text note adding more details. In the future we’ll incorporate more information, such as the disease involvement of the human protein and the sequence differences found in disease-associated alleles that fail to complement the yeast mutation.

You can access these data in two ways: using two new templates in YeastMine, our data warehouse; or via our Download page. Please take a look, let us know what you think, and point us to any published data that’s missing. We always appreciate your feedback!

Using YeastMine to Access Functional Complementation Data

YeastMine is a versatile tool that lets you customize searches and create and manipulate lists of search results. To help you get started with YeastMine we’ve created a series of short video tutorials explaining its features.

Gene –> Functional Complementation template

This template lets you query with a yeast gene or list of genes (either your own custom list, or a pre-made gene list) and retrieve the human gene(s) involved in cross-species complementation along with all of the data listed above.

Human Gene –> Functional Complementation template

This template takes either human gene names (HGNC-approved symbols) or Entrez Gene IDs for human genes and returns the yeast gene(s) involved in cross-species complementation, along with the data listed above. You can run the query using a single human gene as input, or create a custom list of human genes in YeastMine for the query. We’ve created two new pre-made lists of human genes that can also be used with this template. The list “Human genes complementing or complemented by yeast genes” includes only human genes that are currently included in the functional complementation data, while the list “Human genes with yeast homologs” includes all human genes that have a yeast homolog as predicted by any of several methods.

Downloading Functional Complementation Data

If you’d prefer to have all the data in one file, simply visit our Curated Data download page and download the file “”.

Next Page »