Nina Stoletzki,* John Welch, Joachim Hermisson,* and Adam Eyre-Walker *Section of Evolutionary Biology, Department Biology II, Ludwig-Maximilians-University Munich, Planegg-Martinsried, Germany;and Centre for the Study of Evolution, University of Sussex, Brighton, United Kingdom
It has been suggested that volatility, the proportion of mutations which change an amino acid, can be used to infer the levelof natural selection acting upon a gene. This conjecture is supported by a correlation between volatility and the rate ofnonsynonymous substitution (dN), or the ratio of nonsynonymous and synonymous substitution rates, in a variety oforganisms. These organisms include yeast, in which the correlations are quite strong. Here we show that these correlationsare a by-product of a correlation between synonymous codon bias toward translationally optimal codons and dN. Althoughthis analysis suggests that volatility is not a good measure of the selection, we suggest that it might be possible to infersomething about the level of natural selection, from a single genome sequence, using translational codon bias.
Understanding the nature of natural selection on DNA
relation between volatility and dN/dS in yeast may, in fact,
sequences is one of the central goals of molecular evolution.
as Hahn et al. (2005) suggest, be due to a correlation be-
Plotkin, Dushoff, and Fraser (2004) and Plotkin et al.
tween translational codon bias and dN/dS. Although Hahn
(2004) have recently suggested that it is possible to infer
et al. suggest that the correlation between volatility and dN/
the level of natural selection, both positive and negative,
dS may be due to a correlation between translational codon
acting upon a gene from a single genome sequence.
bias and selective constraint they do not resolve whether
They suggest that this can be achieved by measuring
this is the case. They show that a measure of translational
‘‘volatility’’—volatility is the proportion of point mutations
codon bias, codon adaptation index (CAI), explains more of
in a gene, which do not yield a stop codon, which change
the variance in volatility than dN/dS in yeast, but they do
an amino acid. They base their method on the prediction
not pursue the matter further. Plotkin, Dushoff, and Fraser
that genes which have recently undergone amino acid sub-
(2005) investigate the partial correlation between dN/dS
stitutions should be populated by codons with high volatil-
and volatility controlling for CAI and show it is significant,
ity (Plotkin et al. 2004). In support of their thesis they show
but they fail to give the magnitude of the effect.
that in both Mycobacterium and Saccharomyces species,
It, therefore, remains very unclear if the principle cor-
there is a correlation between volatility and the rate of non-
relation is between dN/dS and translational codon bias, with
synonymous substitution (dN) or the ratio of nonsynony-
the correlation between dN/dS and volatility a by-product
nous and synonymous substitution rates (dN/dS). This
of this, or whether the principle correlation is between dN/
correlation is quite strong in yeast, which suggests that vol-
dS and volatility. Also, it might be that both translational
atility might be a useful measure of selection.
codon bias and volatility separately correlate to dN/dS.
The idea that volatility can measure the level of selec-
To investigate the matter further, we take advantage of
tion, either positive or negative, on a gene has been criti-
the fact that in yeast there is a strong correlation between
cized on a number of grounds (Dagan and Graur 2004;
dN/dS (or dN) and both translational codon bias (Pal, Papp,
Friedman and Hughes 2004; Sharp 2004; Chen, Emerson,
and Hurst 2001) and volatility (Plotkin, Dushoff, and Fraser
and Martin 2005; Hahn et al. 2005; Nielsen and Hubisz
2004) and that in yeast some of the translational optimal
2005; Zhang 2005). Much of the debate has centered
codons have relatively high volatility while others have rel-
around the reasons why volatility is not expected to corre-
atively low volatility (table 1). It is well established that co-
late to dN and dN/dS. For example, it has been suggested
don bias and gene expression are correlated in yeast (see
that volatility is unlikely to measure selection because (1) it
e.g., Coghlan and Wolfe 2000). So volatility per amino acid
only depends on four or five amino acids (Dagan and Graur
is expected to increase (Ile, Leu, and Ser) or decrease (Arg
2004; Sharp 2004; Chen, Emerson, and Martin 2005), (2) it
and Gly) with translational codon bias or expression level
has low variance (Dagan and Graur 2004), and (3) simple
(table 1). For example, the most optimal codon in yeast for
models of evolution fail to yield a correlation between dN/
argenine is AGA, which has relatively high volatility. If the
dS and volatility (Dagan and Graur 2004; Nielsen and
principle correlation is between dN/dS (or dN) and trans-
Hubisz 2005; Zhang 2005). However, volatility is corre-
lational codon bias, then we expect AGA usage to be neg-
lated to dN/dS (and dN); so much of this discussion, while
atively correlated to dN/dS (or dN), but if the principle
interesting is slightly tangential. The crucial question is
correlation is between dN/dS (or dN) and volatility, then
we expect AGA usage to be positively correlated to dN/
Almost all of these critiques point out that volatility is
a measure of codon usage bias. As such, the apparent cor-
Our results are unequivocal; in yeast dN/dS (and dN)
is negatively correlated to the use of translational optimal
Key words: volatility, codon bias, selection, nonsynonymous
codons for all amino acids whose synonymous codons dif-
fer in their volatility, even in those whose optimal codons
E-mail: [email protected]; a.c.eyre-walker@
have high volatility. We further show that the correlation
between dN/dS (or dN) and translational optimal codon
Mol. Biol. Evol. 22(10):2022–2026. 2005
use is universal across all amino acids, including those syn-
doi:10.1093/molbev/msi192Advance Access publication June 15, 2005
onymous codons which do not differ in their volatility. The
Ó The Author 2005. Published by Oxford University Press on behalf ofthe Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]
Table 1Synonymous Codon Use of Volatility-Affecting Amino Acids
a Given Relative Synonymous Codon Usage values of Kliman, Naheelah, and Santiago (2003). b Given Relative Synonymous Codon Usage values of Sharp and Cowe (1991).
observed correlation between dN/dS (or dN) and volatility
sume that the transition:transversion ratio 5 4.1, to calcu-
is a by-product of the correlation between dN (or dN/dS)
late the volatilities of individual codons, as suggested
by Plotkin et al. We compute Plotkin’s volatility P values us-
We downloaded the gene alignments from the four
We measured translational codon bias per gene and
yeast species sequenced by Kellis et al. (2003). From these
per amino acid separately. To measure translational codon
we excluded all genes which were not present in all the four
bias, we computed the CAI according to Sharp and Li
yeast species (Saccharomyces cerevisiae, Saccharomyces
(1987) with the corrections suggested by Bulmer (1988).
paradoxus, Saccharomyces mikatae, and Saccharomyces
We also calculated the frequency of optimal codons
bayanus), which did not have start and stop codons in
(FOP) according to the list of optimal codons for S. cerevi-
all species, which had premature stop codons, and which
siae given by Kliman, Naheelah, and Santiago (2003). Vol-
had frameshifting indels. This left 1,077 genes. This is
atility values and codon bias statistics were calculated for
smaller than the data set analyzed by Plotkin, Dushoff,
the S. cerevisiae sequence because this is the best studied of
and Fraser (2005) but has less chance of containing pseu-
We used PAML (Yang 1997) to compute dN, dS, and
Plotkin, Dushoff, and Fraser (2004) suggest using
dN/dS for each gene using the F3 3 4 model in which co-
a statistic, volatility P value, to measure volatility. This
don frequencies are estimated from the nucleotide frequen-
is the probability of a gene having the observed volatility
cies at the three codon positions. Because a physical
given the average synonymous codon use of the genes in
definition of a site is more appropriate for the measurement
the genome. The volatility P measure of Plotkin, Dushoff,
of the synonymous substitution rate (dS), we express dS per
and Fraser 2004 is unlikely to be a very good statistic be-
codon (Bierne and Eyre-Walker 2003). We performed all
cause it will depend to some extent on gene length (Sharp
our analyses on both dN and dN/dS. Although dN/dS is of-
2004) and amino acid composition (Dagan and Graur 2004;
ten regarded as a better measure of the selection acting upon
Zhang 2005)—any statistic based on probability values de-
nonsynonymous sites, it may not be in organisms, like
pends on sample size, and the variance between synony-
yeast, in which there is selection on synonymous codon
mous codons for volatility differs between amino acids.
use. Indeed we note that there is a strong correlation be-
To account for these shortcomings, we calculated an alter-
tween dS per codon and codon usage bias in our data (tables
Confirming the analysis of Plotkin, Dushoff, and
Fraser (2004), we found a highly significant correlation be-
tween the volatility P value of Plotkin, Dushoff, and Fraser
2004, or average volatility, and dN/dS (or dN) per gene
and Xi is the number of times codon i is used for the amino
(table 2). We also confirm the result of Pal, Papp, and
acid aa, Vi is the volatility of that codon, and n is the number
Hurst (2001) that there is a strong correlation between mea-
of amino acids whose synonymous codons differ in their
sures of translational codon bias (FOP and CAI) and dN/dS
volatility. When considering amino acids separately we
used Vaa, the average volatility per amino acid. Note that
So, is the observed correlation between volatility and
the volatility is only affected by five amino acids whose
dN/dS (or dN) due to the correlation between translational
synonymous codons differ in their volatility—Arg, Gly,
codon bias and dN/dS (or dN) or vice versa? To answer this,
Ile, Leu, and Ser (the codons of Ile only differ when the
we look at the five volatility-affecting amino acids individ-
transition:transversion ratio is different from unity). We as-
ually (table 3). We only observe a positive correlation
Table 2Spearman’s Rank Correlation Coefficients Between dN, dN/dS, dS Per Codon, or dS andVolatility or Translational Codon Usage Bias for Each Gene
a Remind, Plotkin’s volatility P value relates inversely to volatility. *** P , 0.001.
between volatility and dN/dS (or dN) for three of the amino
that volatility will only be a measure of selection under
acids (Ile, Leu, and Ser). The two amino acids which show
rather specific conditions (Plotkin et al. 2004).
a negative correlation between volatility and dN/dS (or dN),
Our results may seem surprising given that Plotkin,
opposing the expectation of Plotkin, Dushoff, and Fraser
Dushoff, and Fraser (2005) report a significant partial cor-
2004, are those (Arg, Gly) for which high translational co-
relation between volatility P value and dN/dS in yeast using
don usage (in high expression genes) leads to low volatility
CAI to control for translational codon bias, a result we can
(see table 1). There is also no indication that volatility
confirm on our smaller data set (table 5). However, volatil-
affects the correlation; the correlation between translational
ity P value is not normally distributed, so the probability of
codon bias and dN/dS (or dN) is as strong for Arg and Gly,
the partial correlation is not necessarily accurate, and the
The correlation between translational codon bias and
dN/dS (or dN) is very consistent across amino acids—for
Table 4Spearman’s Rank Correlation Coefficients Between dN,
almost every amino acid the correlation is negative and of-
dN/dS, and dS Per Codon and Translational Codon Usage
ten significant, and if it is positive, the correlation is small
Bias for the Individual Amino Acids Not Effecting Volatility
We have shown that the observed correlation between
dN/dS (or dN) and volatility is an incidental correlation
caused by a correlation between dN/dS (or dN) and trans-
lational codon bias—dN/dS (or dN) correlates negatively
with translational codon bias and volatility, for those amino
acids in which the translationally optimal codons are high in
volatility. This suggests that dN/dS (or dN) is not directly
correlated to volatility and that volatility is therefore not the
best, or even a good, predictor of dN/dS (or dN). This is not
unexpected given recent theoretical work, which suggests
Spearman’s Rank Correlation Coefficients Between dN,
dN/dS, and dS Per Codon and Volatility or Translational
Codon Usage Bias for the Five Amino Acids Effecting
P , 0.01, *** P , 0.001, NS 5 not significant.
* P , 0.05, ** P , 0.01, *** P , 0.001, NS 5 not significant.
Table 5Partial Correlations of Measures of Translational Codon Bias and Measures of VolatilityMeasures for Each Gene with dN and dN/dS
* P , 0.05, *** P , 0.001, NS 5 not significant.
significance of the partial correlation depends critically on
tural data (Tourasse and Li 2000), to predict which genes
the volatility statistic used. If we use our average volatility
are likely to be fast-evolving genes. So, although volatility
instead of the volatility of Plotkin et al., which will depend
has come in for much criticism, Plotkin and colleagues may
to some extent on gene length and amino acid composition
have drawn our attention to an approach to an important
(see Materials and Methods), then the partial correlation
between dN/dS (or dN) and average volatility, controllingfor translational codon bias, becomes very small and non-
significant, while the partial correlation between dN/dS (ordN) and translational codon bias remains (table 5). The
We thank Daniel Jeffares for some initial work,
strongest correlations, either simple or partial, that we ob-
Stephan Hutter and Pieter van Beek for help with Perl,
serve are between translational codon bias and dN/dS (or
and an anonymous referee for helpful comments.
dN), which suggests that these are the primary correlations(tables 2 and 3).
It is also interesting to note that the correlation be-
tween codon bias and dN is consistently stronger than
Akashi, H. 1994. Synonymous codon usage in Drosophila mela-
the correlation between codon bias and dN/dS. This is prob-
nogaster: natural selection and translational accuracy. Genetics
ably due to the fact that dS is correlated to codon bias and
that this correlation is due to selection on codon usage bias
Betancourt, A., and D. Presgraves. 2002. Linkage limits the power
and not variation in the mutation rate.
of natural selection in Drosophila. Proc. Natl. Acad. Sci. USA99(21):13616–13620.
Although volatility does not appear to be a good mea-
Bierne, N., and A. Eyre-Walker. 2003. The problem of counting
sure of selection, Plotkin, Dushoff, and Fraser (2004) may
sites in the estimation of the synonymous and nonsynonymous
have been correct in asserting that it may be possible to infer
substitution rates: implications for the correlation between syn-
something about dN in a gene from a single genome se-
onymous substitution rate and codon usage bias. Genetics
quence. A negative correlation between translational co-
don bias and dN has now been described in three
Bulmer, M. 1988. Are codon usage patterns in unicellular organ-
different organisms: enteric bacteria (Sharp 1991; Rocha
isms determined by selection mutation balance? J. Evol. Biol.
and Danchin 2004), Drosophila (Akashi 1994; Betancourt
and Presgraves 2002; Marais et al. 2004), and yeast (Pal,
Chen, W., J. J. Emerson, and T. M. Martin. 2005. Not detecting
Papp, and Hurst 2001), and we have shown that the corre-
selection using a single genome. Nature 433:E6–E7.
Coghlan, A., and K. H. Wolfe, 2000. Relationship of codon bias
lation is consistent for all amino acids in yeast. Further-
to MRNA concentration and protein length in Saccharomyces
more, although the basis of this correlation is unknown
and subject to much debate (Betancourt and Presgraves
Dagan, T., and D. Graur. 2004. The comparative method rules!
2002; Marais et al. 2004), at least one of the explanations
Codon volatility cannot detect positive Darwinian selection us-
is likely to lead to the correlation being widespread. It has
ing a single genome sequence. Mol. Biol. Evol. 22:1260–1272.
been suggested that the correlation between codon bias and
Friedman, R., and A. L. Hughes. 2004. Codon volatility as an in-
dN arises through a correlation in the strength of selection
dicator of positive selection: data from eukaryotic genome
acting upon synonymous and nonsynonymous mutations,
comparisons. Mol. Biol. Evol. 22:542–546.
probably as a consequence of selection for translational
Hahn, M., J. G. Mezey, D. J. Begun, J. H. Gillespie, A. D. Kern,
accuracy—important amino acid sites in a protein will
C. H. Langley, and L. Moyle. 2005. Codon bias and selectionon single genomes. Nature 433:E5.
be subject to strong selection to be conserved during evo-
Kellis, M., N. Patterson, M. Endrizzi, and E. S. Lander. 2003. Se-
lution and to be accurately translated (Akashi 1994). Thus
quencing and comparison of yeast species to identify genes and
any genome, in which selection for translational accuracy is
regulatory elements. Nature 423:241–254.
effective, should show the correlation, and it may therefore
Kliman, R. M., I. Naheelah, and M. Santiago. 2003. Selection con-
be possible to use codon bias, maybe in combination with
flicts, gene expression, and codon usage trends in yeast.
other information, such as amino acid composition or struc-
Marais, G., T. Domazet-Loso, D. Tautz, and B. Charlesworth.
Sharp, P. M. 2004. Gene ‘‘volatility’’ is most unlikely to reveal
2004. Correlated evolution of synonymous and nonsynony-
adaptation. Mol. Biol. Evol. 22:807–809.
mous sites in Drosophila. J. Mol. Evol. 59:771–779.
Sharp, P. M., and E. Cowe. 1991. Synonymous codon usage in
Nielsen, R., and M. J. Hubisz. 2005. Detecting selection needs
Saccharomyces cerevisiae. Yeast 7:657–678.
Sharp, P. M., and W.-H. Li. 1987. The codon adaptation
Pal, C., B. Papp, and L. D. Hurst. 2001. Highly expressed genes in
index—a measure of directional synonymous codon usage
yeast evolve slowly. Genetics 158:927–931.
bias, and its potential applications. Nucleic Acids Res.
Plotkin, J. B., J. Dushoff, M. M. Desai, and H. B. Fraser. 2004.
Synonymous codon usage and selection on proteins.
Tourasse, N., and W.-H. Li. 2000. Selective constraints, amino
acid composition and the rate of protein evolution. Mol. Biol.
Plotkin, J. B., J. Dushoff, and H. B. Fraser. 2004. Detecting se-
lection using a single genome sequence of M. tuberculosis
Yang, Z. 1997. PAML: a program package for phylogenetic
and P. falciparum. Nature 428:942–945.
analysis by maximum likelihood. Comput. Appl. Biosci.
———. 2005. Reply. Nature 433:E7–E8.
Rocha, E. P. C., and A. Danchin. 2004. An analysis of determi-
Zhang, J. 2005. On the evolution of codon volatility. Genetics
nants of amino acid substitution rates in bacterial proteins. Mol.
Sharp, P. M., 1991. Determinants of DNA sequence divergence
between Escherichia coli and Salmonella typhimurium: codon
usage, map position, and concerted evolution. J. Mol. Evol. 33:23–33.
COPERSUCAR S.A. Av. Paulista 287 1º 2º e 3º andares Bela Vista Tel (55 11) 2618 8166 Fax (55 11) 2618 8355 18/07/2012 Copersucar renova parceria com o Instituto Ayrton Senna Em 2011, o SuperAção Jovem beneficiou 23.294 alunos de 143 escolas paulistas A Copersucar S. A., maior comercializadora de açúcar e etanol do Brasil, renovou nesta quarta-feira (18/07/2012), na se