Contents
The SGD Physico-chemical Properties page provides basic protein information calculated using the predicted primary sequence. This information is calculated using the ProtParam tool (atomic composition, extinction coefficient, estimated half-life, instability index and aliphatic index) available at ExPASy, GCG's PEPTIDESORT (amino acid composition) and CodonW (coding region translation calculations). More detailed information about the individual properties and how they are calculated can be found in the sections listed below. Documentation relevant to the calculations obtained using the ProtParam tool was obtained from the ProtParam references/documentation page.
The Amino Acid Composition is based on the primary sequence. The table contains three columns: the first lists both the three- and one- letter designations for the twenty amino acids, the second column lists the number of amino acids present in one molecule, and the third contains the composition expressed as a percentage. Values in this table were calculated using GCG's PEPTIDESORT.
The Atomic Composition Table displays the composition of the protein, with respect to the number of atoms of carbon, hydrogen, nitrogen, oxygen, and sulfur that it contains as well as the total number of atoms and the resulting formula. Values in this table were calculated using the ProtParam tool, available at ExPASy.
The estimated half-life is a prediction of the time required for half of a synthesized protein to turn-over both in vitro and in vivo. This value is calculated based on Varshavsky's "N-end rule", which predicts protein half-life based on the identify of the N-terminal amino acid residue of a protein (reviewed in Varshavsky, 1996, and Varshavsky, 1997). The N-terminal residue plays an important role in the determination of in vivo protein stability. The ordering of protein half-life was determined by creating a series of ubiquitin-beta-gal fusion proteins in yeast where the identity of the amino terminal residue beta-gal residue was varied. When expressed in yeast the ubiquitin moeity was cleaved exposing various residues at the N-termini. The half-lives of these proteins varied greatly from less than 3 minutes to greater than 30 hours depending on the identity of the residue (Bachmair et al., 1986). Similar experiments were carried out in E. coli and in mammalian reticulocytes (Gonda et al., 1989 and Tobias et al., 1991). Estimated half-lives were calculated using the ProtParam tool, available at ExPASy and are not applicable for N-terminally modified proteins. Approximate half-lives of proteins in the three systems analyzed are summarized in the following table (taken from Varshavsky, 1997 and Gonda et al., 1989).
N-end rule and corresponding half-life of X-beta-gal
Residue X Yeast E.coli Mammalian
Ala >30 hour >10 hour 4.4 hour
Arg 2 min 2 min 1.0 hour
Asn 3 min >10 hour 1.4 hour
Asp 3 min >10 hour 1.1 hour
Cys >30 hour >10 hour 1.2 hour
Gln 10 min >10 hour 0.8 hour
Glu 30 min >10 hour 1 hour
Gly >30 hour >10 hour 30 hour
His 3 min >10 hour 3.5 hour
Ile 30 min >10 hour 20 hour
Leu 3 min 2 min 5.5 hour
Lys 3 min 2 min 1.3 hour
Met >30 hour >10 hour 30 hour
Phe 3 min 2 min 1.1 hour
Pro >5 hour ? >20 hour
Ser >30 hour >10 hour 1.9 hour
Thr >30 hour >10 hour 7.2 hour
Trp 3 min 2 min 2.8 hour
Tyr 10 min 2 min 2.8 hour
Val >30 hour >10 hour 100 hour
The instability index was developed based on a statistical analysis of 12 unstable and 32 stable proteins (Guruprasad et al., 1990). This analysis revealed the presence of certain dipeptides that occurred with significantly different frequencies between stable and unstable proteins. A dipeptide instability weight value (DIWV) was assigned to each of 400 different dipeptides. These weight values were then used to calculate an instability index (II) defined as:
i=L-1
II = (10/L) * Sum DIWV(x[i]y[i+1])
i=1
where: L is the length of sequence
DIWV is the instability weight value
and x[i]y[i+1] is a dipeptide starting at position i.
Proteins with an instability index less than 40 are predicted to be stable, whereas those with a value greater than 40 are predicted to be unstable.
The extinction coefficient (epsilon) is the wavelength-dependent molar absorptivity coefficient with units of M-1 cm-1. The extinction coefficient provides an indication of the amount of light that a given protein will absorb at a certain wavelength (usually 280 nm). During protein purification a spectrophotometer can be used to follow the protein of interest if the extinction coefficient is known. The molar extinction coefficient of a protein can be estimated based on its amino acid composition. The extinction coefficient of the native protein in water can be calculated based on the molar extinction coefficient of tyrosine, tryptophan and cystine (cysteine does not absorb much at wavelengths greater than 260 nm while cystine does) using the following equation:
E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine)
where: Ext(Tyr) = 1490
Ext(Trp) = 5500
Ext(Cystine) = 125
The absorbance (optical density) can then be calculated using the following formula:
Absorb(Prot) = E(Prot) / Molecular weight
The concentration of a protein can also be calculate based on the extinction coefficient using the Beer-Lambert Law, since:
Absorb(Prot) = E x l x C
where: E = extinction coefficient
l = pathlength (cm)
C = protein concentration (M)
Two extinction coefficient values are calculated by ProtParam, the first value is
based on the assumption that all cysteine residues appear as half cystines, and the second assumes that no cysteines appear as half cystines. The computation has been demonstrated to be quite reliable for proteins
that contain Trp residues, but for proteins without Trp residues there may be more than a 10% error.
These calculations are based on the method developed by Edelhoch, 1967, using extinction coefficients for Trp and Tyr, as determined by Pace et al., 1995. The values used in the calculation of extinction coefficients for denatured proteins were also found to be accurate for calculating coefficients for the native protein (Gill and von Hippel, 1989). In general, since Trp residues contribute much more to the overall extinction coefficient than Tyr and cystine residues, the calculations tend to be much closer to measured values for proteins that contain Trp residues.
The aliphatic index refers to the relative volume of a protein that is occupied by aliphatic side chains (alanine, isoleucine, leucine and valine) and contributes to the increased thermostability observed for globular proteins. The aliphatic index of a protein is calculated according to the following formula (Akai, 1980):
Aliphatic index = X(Ala) + a * X(Val) + b * ( X(Ile) + X(Leu) ) where X(Ala), X(Val), X(Ile), and X(Leu) are mole percent (100 X mole fraction) of alanine, valine, isoleucine, and leucine. The coefficients a and b are the relative volume of valine side chains (a = 2.9) and of Leu/Ile side chains (b = 3.9) relative to that of alanine side chains.
CodonW analyzes the correspondence between amino acids and codon usage in a set of protein sequences, based on a given genetic code (i.e. that used in the S. cerevisiae nucleus versus that used in its mitochondrion). CodonW was designed to work with any genetic code. Decisions regarding whether an amino acid is synonymous or non-synonymous, the translation of a codon, the number of codons in a codon family, how many synonyms a codon has, are all determined at run time. Seven alternatives to the universal genetic code have been built in to the program, including S. cerevisiae chromosomal codon usage and S. cerevisiae mitochondrial codon usage. In SGD, we have used these two built-in options, as appropriate, to perform codon usage-based calculations for chromosomally-encoded or mitochondrially-encoded ORFs. Note that codon usage-based calculations are not currently performed for ORFs present within transposable elements (Ty elements), because the codon usage of transposable element genes differs from that of chromosomal genes (see the CodonW tutorial).
A brief description of the following terms may be found in the SGD Glossary. Please click on the term for its description.
Return to Saccharomyces Genome Database |
Send a Message to the SGD Curators ![]() |