April 17, 2018
1,011. That’s the number of different Saccharomyces cerevisiae yeast strains that were whole-genome sequenced and phenotyped by a team of researchers jointly led by Joseph Schacherer and Gianni Liti, published this week in Nature (Peter et al., 2018; data at: bit.ly/1011genomesAtSGD).
Scrupulously gathering isolates of S. cerevisiae from as many diverse geographical locations and ecological niches as possible, the authors and their collaborators plucked yeast cells not only from the familiar wine, beer and bread sources, but also from rotting bananas, sea water, human blood, sewage, termite mounds, and more. The authors then surveyed the evolutionary relationships among the strains to describe the worldwide population distribution of this species and deduce its historical spread.
They found that the greatest amount of genome sequence diversity existed among the S. cerevisiae strains collected from Taiwan, mainland China, and other regions of East Asia. This means that in all likelihood the geographic origin of S. cerevisiae lies somewhere in East Asia. According to the authors, our budding yeast friend began spreading around the globe about 15,000 years ago, undergoing several independent domestication events during its worldwide journey. For example, it turns out that wine yeast and sake yeast were domesticated from different ancestors, thousands of years apart from each other. Whereas genomic markers of domestication appeared about 4,000 years ago in sake yeast, such markers appeared in wine yeast only 1,500 years ago.
Additionally — and similar to the situation where human interspecific hybridization with Neanderthals occurred only after humans migrated out of Africa — it appears that S. cerevisiae has inter-bred very frequently with other Saccharomyces species, especially S. paradoxus, but that most of these interspecific hybridization events occurred after the out-of-China dispersal.
There are many more gems to be found among the treasure trove of information in this paper. Some notable conclusions from the authors include: diploids are the most fit ploidy; copy number variation (CNV) is the most prevalent type of variation; most single nucleotide polymorphisms (SNPs) are very rare alleles in the population; extensive loss of heterozygosity is observed among many strains. There are also phenotype results (fitness values) for 971 strains across 36 different growth conditions.
As is often the case for yeast, the ability to sequence and analyze whole genomes at very deep coverage has yielded broad insights on eukaryotic genome evolution. The team’s work highlights this by presenting a comprehensive view of genome evolution on many different levels (e.g., differences in ploidy, aneuploidy, genetic variants, hybridization, and introgressions) that is difficult to obtain at the same scale and accuracy for other eukaryotic organisms.
SGD is happy to announce that in conjunction with the authors and publishers, we are hosting the datasets from the paper at this SGD download site. These datasets include: the actual genome sequences of the 1,011 isolates; the list of 4,940 common “core” ORFs plus 2,856 ORFs that are variable within the population (together these make up the “pangenome”); copy number variation (CNV) data; phenotyping data for 36 conditions; SNPs and indels relative to the S288C genome; and much more. We hope that the easy availability of these large datasets will be useful to many yeast (and non-yeast) researchers, and as the authors say, will help to “guide future population genomics and genotype–phenotype studies in this classic model system.”