Analysis of synonymous codon usage in maize
--Sheila L. Fennoy, Gita Surti and Julia Bailey-Serres
Synonymous codon usage was previously examined for a small number of maize nuclear genes (Murray et al., Nucl. Acids Res. 17:477-497, 1980; Campbell and Gowri, Plant Physiol. 92:1-11, 1990; Hamilton and Mascarenhas, MNL65:2-3, 1991). We have completed a codon usage analysis of 101 nuclear genes, obtained from GenBank (Release 73, 9/92) and EMBL (Release 32, 9/92) databases and the literature. Codon usage tables were generated with the Genetics Computer Group program CODON FREQUENCY (Devereux et al., Nucl. Acids Res. 12:387-395, 1984). The relative synonymous codon usage (RSCU) of the 101 genes was calculated to show the non-uniformity in synonymous codon usage in maize (Table 1). Codons ending in G and C (GC3) were most frequently used and are indicated in boldface type. The preference for codons ending in G or C reflects the high GC content of the maize genome.
To examine synonymous codon usage among genes, correspondence analysis, a multivariate statistical analysis, was performed on the codon frequency tables of the 101 maize genes (Sharp and Lloyd, Mol. Gen. Genet. 230:288-294, 1991). The analysis produced a two dimensional plot depicting the first and second most influential factors that distinguish the patterns of codon usage of individual genes. The displacement of genes in Dimension 1 reflected the differences among genes in bias for GC3. Genes that plotted above zero, in Dimension 1, had GC3 values ranging from 30 to 70 percent; those below zero had values ranging from 70 to 90 percent. The displacement of genes in Dimension 2 was less extreme. Dimension 2 reflected the differences among genes in use of the set of codons most common to maize (Table 1). In summary, while the genes separated in Dimension 1 based on GC3 content, they separated in Dimension 2 by codon selection.
Genes were grouped by subcellular location, function or condition for induction to discern differences in codon usage that reflected characteristics of expression (Fig. 1). Of the non-zein genes, synonymous codon usage of highly expressed genes encoding structural and photosynthetic proteins was biased towards high GC3 content. The ABA-inducible genes were biased in GC3 and codon choice as demonstrated by the displacement in Dimensions 1 and 2 (Fig. 1). The group of regulatory genes included the transcription factors, transposable elements, phosphatases and kinases. These genes were predominantly distinguished from each other by their bias in GC3. Genes encoding transcription factors had higher GC3 values than those encoding kinases and phosphatases. The open reading frames of transposable elements, MuR1, Ac (ORFa),and Spm (TNPa), were GC3 poor and showed an extreme bias in codon usage. Of those genes encoding structural proteins, the histones and lipid-body-associated proteins were most biased in GC3. These genes are most likely highly expressed. The genes encoding cytosolic enzymes had average GC3 content and plotted at negative values in Dimension 2. The storage proteins included zein and non-zein proteins. As noted by others (cf. Hamilton and Mascarenhas, MNL65:2-3, 1991), the genes encoding the 19 and 22kD zeins of endosperm have unusual codon usage. The correspondence analysis clearly demonstrates the relatively low GC3 content and extreme codon usage bias of the 19 and 22kD zeins. In contrast, the 15 and 16kD zeins showed near average GC3 content but distinct codon selection.
Synonymous codon usage reflects the co-adaptation of the population of charged tRNAs and the coding sequence. The degree of non-random codon usage in each gene may reflect its rate of translation and/or the mutational bias of the genome. We plan to test whether codon usage affects the rate of elongation in vivo and in vitro in maize. The analysis may provide important information on translational control mechanisms in plants.
The RSCU and correspondence analyses presented here should prove useful for designing degenerate oligonucleotides for polymerase chain reaction amplification of coding sequences in maize.
Table 1. Summary of Codon Usage. Presented are: amino acid, AA; sum of the frequencies of the codons, N; relative synonymous codon usage, RSCU; relative composition of each amino acid in all genes, AC. Bias in codon usage in maize is measured by the RSCU. RSCU mathematically describes the disproportionate use of synonymous codons. The most commonly used synonymous codons are those ending in C and G and are shown in boldface type.
Figure
1. Correspondence Analysis of Codon Usage. This multivariate statistical
analysis was done on 101 maize gene sequences to discern differences in
codon usage. Coding sequences are identified by subcellular location, function,
or induction characteristics of the protein. The two axes, Dimension 1
and 2, depict the first and second most influential factors for dispersion.
The origin represents the average codon usage for all genes. The distance
between genes on the graph is a measure of their dissimilarity in synonymous
codon usage.
Return to the MNL 67 On-Line Index
Return to the Maize Newsletter Index
Return to the MaizeGDB Homepage