New & Noteworthy

Getting the Big Picture from 100 Genomes

May 20, 2015

Like the Peruvian Hairless dog, in some ways the S288C genome looks quite different from other members of its species. Image via Wikimedia Commons

Imagine if aliens visited the earth to learn about dogs, but they stumbled upon a colony of the very rare Peruvian Hairless. Taking a sample for DNA analysis, they would retreat to their home planet, do their studies, and conclude that all dogs had smooth, mottled skin and a stiff mohawk—as well as whatever crazy mutations the Peruvian Hairless happens to carry. 

Until recently, S. cerevisiae researchers have been a bit like those aliens. The genomic sequence of the reference strain S288C was completed in 1996, and for a long time it was the only sequence available. Scientists knew a lot about the S288C genome, but they didn’t have any perspective on the species as a whole.

In the past few years, genomic sequences have become available from a handful of other strains. But now, as described in a new paper in Genome Research, Strope and colleagues have determined the genomic sequences of 93 additional S. cerevisiae strains to make the number an even hundred.

This collection of strains and sequences has already provided new insights into yeast phenotypic and genotypic variation, and represents an incredible resource for future studies. And the comparison with this collection of other strains suggests that in some ways, S288C may be just as unusual as the Peruvian Hairless.

This collection of strains and their sequences gave the researchers a much broader perspective across the whole S. cerevisiae species. It’s as if the aliens discovered Golden Retrievers, Great Danes, Chihuahuas, and more. We only have space here to touch upon a few of the highlights.

First off, they confirmed what many yeast researchers have suspected for a while—S288C is a bit odd.  We already knew that a S288C carries polymorphisms in several genes that affect its phenotype. For example, the MIP1 gene in S288C encodes a mitochondrial DNA polymerase that is less efficient than in other strains, making its mitochondrial genome less stable.

Back when fewer strain sequences were available, it wasn’t clear whether the S288C polymorphisms in other genes like MKT1, SSD1, MIP1, AMN1, FLO8, HAP1, BUL2, and SAL1 were the exception or the rule. Now that Strope and colleagues had 100 genomes in hand, they could see that these differences are indeed peculiar to S288C and its close relative W303.  They might have arisen because of the long genetic isolation of the strains, or because of special selective pressures they faced during growth in the lab.

They also found a lot of variation in how often S. cerevisiae strains have acquired whole chromosomal regions from other Saccharomyces species. This process, known as introgression, happens when related species mate to form hybrids. Stretches of DNA that are transferred in this way are recognizable because gene order is preserved, but all the genes they contain are highly diverged.

The researchers found 141 of these regions containing 401 genes. Many showed similarity to S. paradoxus, which is known to hybridize with S. cerevisiae, but others apparently came from unknown, as yet un-sequenced Saccharomyces species. In a couple of cases that the authors looked at closely, the introgressed genes had slightly different functions from their native S. cerevisiae counterparts.

Another notable finding by Strope and colleagues concerned some genes that exist in multiple copies. The ENA genes, encoding an ATP-dependent sodium pump, are present in 3 copies in S288C (ENA1ENA2, and ENA5), while the CUP1-1 and CUP1-2 genes, encoding metallothionein that binds to copper and mediates copper resistance, are present in 10-15 copies.

To get perspective on a whole species, you need to look at lots of different examples. Image by Sue Clark via Flickr

The sequence coverage in these regions relative to their flanking regions allowed the researchers to see exactly how many repeats are present in each strain. All had between 1-14 copies of ENA genes and 1-18 copies of CUP genes. Interestingly, the strains of clinical origin had significantly higher copy numbers of CUP genes than the non-clinical strains, suggesting that copper resistance is an important trait for virulence.

So, instead of being confined to the S288C genome, S. cerevisiae researchers can now get a much fuller idea of the range of genetic and phenotypic variation within the species. The strains (available at the Fungal Genetic Stock Center), along with their genome sequences (available in GenBank), are an amazing resource for classical and quantitative genetics and comparative genomics.

Unlike those aliens, we won’t end up thinking of yeast as a mostly bald dog with a mohawk. No, we will have a fuller picture of S. cerevisiae strains in all their glory.

A few technical details

In selecting the strains to sequence, Strope and colleagues chose from a wide variety of yeast cultures isolated from the environment and from hospital patients with opportunistic S. cerevisiae infections. But they faced a problem: many of the cultures had irregular numbers of chromosomes or genome rearrangements, which would complicate both interpretation of the sequence data and any future genetic analysis.

To avoid this problem, the researchers selected only strains that were able to sporulate and produce four viable spores—showing that their genomes weren’t messed up. They also wanted strains with no auxotrophies (nutritional requirements), since these can negatively affect growth and complicate the comparison of phenotypes. In some cases, they corrected specific mutations in the strains to increase their fitness.

They ended up with 93 homozygous diploid strains to sequence. Producing paired-end reads of 101 bp, they generated genome assemblies that had 22- to 650-fold coverage per strain.

Because the sequence reads were relatively short, they didn’t provide enough information to assemble the sequence across repetitive regions. So Strope and colleagues used a genetic method to determine gene order. They crossed haploid derivatives of the strains to the reference strain S288C; if their genomes were not colinear with that of S288C, then some of the resulting spores would be inviable.

This analysis showed that 79 of the strains had chromosomes colinear to those of S288C, and allowed assembly of their genomes across multicopy sequences. The remaining strains had chromosomal translocations relative to S288C. Twelve of these carried the same reciprocal translocation between chromosomes 8 and 16.

by Maria Costanzo, Ph.D., Senior Biocuration Scientist, SGD

Categories: Research Spotlight

Tags: strains , Saccharomyces cerevisiae , genome

New Alternative Reference Genomes

December 08, 2014

At SGD, we are expanding our scope to provide annotation and comparative analyses of all major budding yeast strains, and are making progress in our move toward providing multiple reference genomes. To this end, the following new S. cerevisiae genomes have been incorporated into SGD as “Alternative References”: CEN.PK, D273-10B, FL100, JK9-3d, RM11-1a, SEY6210, SK1, Sigma1278b, W303, X2180-1A, Y55. These genomes are accessible via Sequence, Strain, and Contig pages, and are the genomes for which we have curated the most phenotype data, and for which we aim to curate specific functional information. It is important to emphasize that we are not abandoning a standard sequence; S288C is still in place as “The Reference Genome”. However, we do recognize that it is helpful for students and researchers to be able to ‘shift the reference’, selecting the genome that is most appropriate and informative for a specific area of study.

These new genome sequences have been also been added to SGD’s BLAST datasets, multiple sequence alignments, the Pattern Matching tool, and the Downloads site. Please explore these new genomes, and send us your feedback.

Categories: New Data Data updates Sequence

Tags: strains , Saccharomyces cerevisiae , reference genome