New & Noteworthy

The Gift for the Man Who Has Everything

July 8, 2015

Gifts can be hard to buy for some people. They have everything they need and not many outside interests. What to do?

Having trouble finding that personal gift for that impossible to buy for person? How about a vanity protein with their name written right into the amino acid sequence? Image by D. Barry Starr

You could name a star after them or get them some knick knack they don’t need. Or you could design a personalized protein that has their name in it, solve the structure and present them with the picture.

This is what Deiss and coworkers did to celebrate the 50th birthday of their colleague Andrei N. Lupas, a key figure in studying coiled-coil proteins. They created a personalized protein based on Gcn4 from Saccharomyces cerevisiae. And of course Gcn4 is a coiled-coil protein!

Coiled-coil proteins are the perfect clay for biosculpting a personalized protein. They follow a relatively simple set of rules which makes it easy to predict how they will fold. There isn’t much of the “protein folding problem” with these user-friendly proteins.

Basically these proteins consist of repeated 7 amino acid motifs that each form an alpha helix. They have hydrophobic residues down one face of the helix so that they will tend to oligomerize with each other to keep the hydrophobic residues away from the water. These helices spontaneously coil up like a rope (hence their name).

The 7 amino acids of a repeat are usually represented as a-b-c-d-e-f-g and are arranged in the pattern hxxhcxc, with h being hydrophobic residues, c being charged residues and x being most any other amino acid. So a and d must be hydrophobic, and e and g charged. That’s pretty much it!

Deiss and coworkers used the name Andrei N. Lupas to create a personalized coiled coil. They replaced 12 amino acids in Gcn4 with the amino acids represented by the letters in his name. Well, they were able to do that for most of the letters.

First off, they had to Roman things up a bit and turn the U into a V (there is no amino acid with the single amino acid code U). So here is the amino acid sequence they used and how they lined it up with the 7-amino acid repeats:

In this arrangement, the hydrophobic residues are asparagine, isoleucine, and valine, and the charged residues are aspartic acid, glutamic acid, proline, and serine. Obviously the last two are not optimal, especially the proline. Proline has an especially rigid conformation and is known to wreak havoc with alpha helices.

When the authors analyzed the protein, they found that as predicted, the proline disrupted the part of the alpha helix with which it was associated. But not enough to completely destroy the coiled coil structure. X-ray diffraction showed that this protein was still able to trimerize properly. They had created a distorted but functional personalized protein. What other kind would anyone want!

And it isn’t as if proline is completely absent from the heptad repeats of coiled-coil proteins. A quick search by the authors found two viral fusion proteins, 1ZTM and 3RRT, that could form a trimer even though they too had prolines. In both of these proteins the proline is in the f position.

They also found 4 dimers with a proline in a heptad repeat. In these cases the proline is at b or c. So no known natural coiled-coil proteins have a proline at the e position. Talk about personalized!

How cool is all of this, and who wouldn’t want a protein of their very own? Unfortunately, not everyone can easily have one.

For example, President Barack Obama would have real trouble since there are no amino acids designated with a B or an O and there is no obvious way to transform these letters into ones that are present in the single letter code. Jeb Bush is out too, but maybe we can do something with Hillary Clinton. Let’s see if we can line up the amino acids of her first name to create a personalized Gcn4 just for her.

“HILLARY” isn’t too bad by itself. All the letters are amino acids (yay) and a and d are hydrophobic (isoleucine and alanine). Aspartic acid works very well for e and while probably not perfect, histidine isn’t too bad for g. The tyrosine at position f is not ideal either but is way better than a proline. This thing might replace one heptad repeat in Gcn4 without causing too many problems.

So what about your name? Can you turn yours into a heptad repeat to create your own personalized Gcn4? 

by D. Barry Starr, Ph.D., Director of Outreach Activities, Stanford Genetics

Network Maintenance at SGD on July 15, 2015

July 7, 2015

The SGD website ( and all its resources (Download Server, GBrowse, SPELL, YeastMine, Pathway Tools, and Textpresso) will be unavailable on Wednesday, July 15, 2015 from 2:30-4:30 pm PDT (5:30-7:30pm EDT, 9:30-11:30pm GMT, 6:30-8:30am Japan) for network maintenance. We will make every effort to minimize any downtime associated with this maintenance. We apologize for any inconvenience this may cause, and thank you for your patience and understanding.

SGD Help Video: YeastMine is Awesome!

July 7, 2015

If you’re not already using YeastMine to answer all your questions about the Saccharomyces cerevisiae genome and the gene products it encodes…you should be!

This versatile tool lets you slice and dice data from SGD in any way you choose. You can ask questions like “How many proteins between 25 and 35 kDa in size are integral to the nuclear membrane?” or “Which genes can mutate to confer oxidative stress resistance, and what biological processes are they involved in?”

Start with this video to see a quick sample of three cool features in YeastMine.

Where’s That Protein?

July 1, 2015

Waldo will always be hard to find, but we now know exactly where to find more than 4,000 S. cerevisiae proteins, thanks to new methods and an analysis pipeline. Image by William Murphy via Wikimedia Commons

You might be familiar with the Where’s Waldo book series, especially (but not necessarily) if you have kids. They challenge the reader to find Waldo within huge, intricately drawn groups of people. Even though Waldo has his distinctive characteristics—glasses and a striped shirt and hat—he can be very hard to find.

Now imagine that the drawings shift under different conditions, so that Waldo could be in any of several places at different times. And imagine that you’re not just looking for Waldo, but also for thousands of other unique individuals—all tagged in the same way. This is the challenge faced by researchers who want to know where each protein in a cell is located and how its location and abundance respond to different environments.

But, as genetic, robotic, microscopic, and computational tools get more and more sophisticated, it’s becoming possible to pinpoint Waldo and his companions even as they move around within the jam-packed yeast cell.

In two new papers, scientists from the University of Toronto describe a huge effort that entailed over 9 billion quantitative measurements to find the location and measure the abundance of more than 4,000 S. cerevisiae proteins. Chong and colleagues wrote in Cell about the approach and experimental methods, while Koh and colleagues published in G3 about the computational methods and the database that houses all the data, called CYCLoPs for Collection of Yeast Cells and Localization Patterns.

This work couldn’t have been done without a valuable resource that was created some years ago: the yeast GFP collection. It’s a set of strains, each with the green fluorescent protein gene fused to the 3’ end of one open reading frame to express a GFP fusion protein from the ORF’s native promoter. Not every yeast protein can be detected this way: some are expressed too weakly, while others may actually be destabilized by their GFP tags. Still, more than 4,100 of these fusion genes—71% of the proteome—give a visible GFP signal in the cell.

The researchers started with these ~4,100 strains and transformed each with a plasmid expressing red fluorescent protein. This allowed them to visualize the boundaries of each cell. Then they got to work, taking pictures of at least 200 cells of each strain and developing an automated pipeline to analyze them. They ended up analyzing 300,000 micrographs of more than 20 million cells, beating the few dozen Where’s Waldo books by a long shot!

The scientists looked at each protein in wild type, in a mutant strain, and in the presence of two drugs. The mutant strain they studied was deleted for RPD3, which encodes a lysine deacetylase that regulates the stability and interactions of histones and other proteins. The drug treatments were done with several different concentrations of rapamycin (an inhibitor of the TORC1 complex, which is an important regulator of cell growth) or hydroxyurea (a DNA replication inhibitor).

The end result was an enormous collection of data, now stored in the CYCLoPs database, that shows the abundance of each protein in each of 16 cellular compartments under all of these different conditions. These data are much more quantitative and consistent than any protein abundance or localization data that had been obtained before. They are stored in such a way that measurements within single cells can be accessed, and the database can be searched by patterns of changes in localization or abundance as well as for data on a particular protein.

The authors came up with some innovative methods for visualizing this immense dataset to get a high-level overview. One of their most surprising findings was just how many proteins localize to multiple places. We tend to think of the cell as a tidy place where each protein has one particular location, but Chong and colleagues found that it’s extremely common for proteins to be in several spots.

Most often, when proteins are present in more than one place, those places are the nucleus and the cytoplasm. Some proteins had already been shown in small-scale studies to be present in both compartments, or to shuttle between them. But the authors saw an astounding 1,029 proteins localizing to both the nucleus and cytoplasm under standard conditions in wild-type cells.

Not counting the proteins in the nucleus and cytoplasm, another 511 proteins localized to more than one place. Some were seen in up to five different subcellular compartments.

The proteins with multiple locations, as a group, were more likely than the average protein to be phosphorylated. This made sense, because phosphorylation of proteins is known to regulate their localization. And many of these proteins themselves had regulatory roles, controlling processes such as cell division.

The fact that data were collected from single cells means that we can use them to uncover the dynamics of protein movement. For example, if a protein was scored as localizing to both the nucleus and the cytoplasm, does that mean there’s a pool of it in both places at all times, or does it move back and forth? The single-cell data for two representative proteins, Mcm2 and Whi5, showed clearly that any one cell has each of these proteins in either the nucleus or cytoplasm, but not both. But some other proteins hang out in both places at once. And the dynamics of still more roving proteins are just waiting to be revealed.

Researchers will be mining the CYCLoPs resource to find detailed information about specific proteins, pathways, and processes for years to come. The data gathered in the rpd3 mutant and under rapamycin and hydroxyurea treatment served as proof of principle that the system can be used to assess the effects of a variety of mutations and drugs.

So this study puts a spotlight on Waldo in each picture and makes it simple to find him and his friends. This mass of data on where proteins are and how they move around has far-reaching implications for yeast systems biology, and the methodology can now be applied to cells of other organisms as well. In the coming weeks, we’ll make it even simpler for you to access these data from SGD, by adding links for individual proteins to the CYCLoPs database.

by Maria Costanzo, Ph.D., Senior Biocuration Scientist, SGD

« Previous Page
Next Page »