With the development of genomics as a collection of technologies for the study of the genome and individual gene function, cloned and characterized copies of genes are becoming available in unprecedented numbers for many species. In particular the development of the Expressed Sequence Tag (ESTs) approach by C. Venter (Adams et al., Science 252:1651-1656, 1991) coupled with continually-decreasing costs for obtaining sequence data have provided researchers with a cheap and high-volume method for obtaining gene sequence data for literally thousands of genes in species not previously considered as model systems. In this approach, clones are randomly selected from cDNA libraries, prepared from a number of representative tissues and developmental states, and subjected to single pass sequencing, usually from the presumed 5' terminus of the original mRNA. Comparison of the resulting sequence data with entries in public databases is often able to provide a significant similarity indicative of gene function for between 40 and 60% of clones so analyzed. Given the relatively high degree of amino acid sequence conservation of genes across species and even genera, use of gene sequence data gathered by this method for expressed regions provides a method whereby homologs in your favorite species can often be identified for any published gene and function, and sometimes across very distant evolutionary gulfs. In a very real sense, this approach can be viewed as the �Rosetta Stone� for biological research in that DNA sequence data for genes can be used to �translate� an advance in knowledge in one species into your favorite species by providing researchers with a clone for that gene. Subsequent study of that gene using the clone as a tool can then significantly accelerate the spread of knowledge from model systems to those previously thought to be intractable or impractical to many advanced techniques.
Researchers at Pioneer Hi-Bred are developing a large EST collection
from maize and we have been investigating its utility in isolating and
identifying maize homologs for genes first described in other species.
During this process we have encountered two difficulties with this approach,
the problem of isolating ESTs for rarely-expressed genes and the issue
of distinguishing related gene family members from actual homologs. By
virtue of the fact that mRNAs are expressed at very different rates in
different tissues, developmental states, and under different environmental
conditions, and given that ESTs are selected from random only from the
cDNA libraries that are prepared, it is not surprising that not all genes
are easily found by this approach. Nevertheless, given well over 100 cDNA
libraries screened and resulting in over 100,000 entries, we were still
surprised at our inability to identify maize homologs for many genes already
isolated in other species, albeit usually by other methods. For instance,
in looking for maize homologs for the following genes involved in the initiation
of flowering which we would anticipate to be rarely-expressed and/or only
limited to a small number of cells at specific developmental states, we
have only found convincing evidence for three homologs. Even looking for
ESTs for genes already cloned in corn, such as teosinte-branched,
terminal
ear, dwarf3, and purple plant1, we have been unable to
identify any with complete identity. While both subtraction and normalization
strategies should exert a positive effect upon some aspects of this problem
and Pioneer researchers have begun to incorporate them into their library
construction protocols, this augmentation still does not address the issues
of not being able to economically sample all possible tissues, developmental
stages, and environmental conditions and hence may not help in the identification
of ESTs for many genes in corn.
gene name | number of ESTs |
constans | 1 |
leafy/floricaula | 0 |
terminal flower/centroradialis | 1 |
luminidependens | 1 |
FCA | 0 |
phyB | 0 |
OsMADs1 | 12 |
Similarly, we have at times had difficulty in distinguishing homologs for some genes first identified in other species, both for rarely-expressed genes and at times even for more abundantly-expressed gene families. Again with genes purported to play a role in the vegetative-to-flowering transition, we identified a number of possible homologs for Arabidopsis gene, constans, at the level of possessing significant amino acid similarity. Examination of those which still possessed similarity at the nucleotide level helped to reduce this list further but the one most likely candidate when examined further appeared to lack all of the important sequence elements identified in the Arabidopsis gene and we have been unable to confirm any strong functional relationship to it by other methods. Similar difficulties have been encountered with discerning potential homologs for luminidependens. In our examination of genes involved in the lignin biosynthetic pathway, we have encountered similar difficulties, although these are expressed at much higher levels than the regulators of the flowering process. For instance, at least four and probably more homologs have been found for the 4-coumarate ligase gene, all with relatively high BLAST probabilities to previously described 4CL genes in rice, potato, and soybean. From the expression patterns of these genes, it is not clear which of these actually participate in lignin biosynthesis but may simply possess sequence elements in common with the 4CL gene and biochemical activity towards lignin precursors. Numerous genes for C-OMT, CCoA-OMT, and CAD have also been identified and their actual involvement in lignin biosynthesis has proven difficult to confirm. This phenomenon is not limited to this pathway as we have identified at least one and probably two more sucrose synthases which are expressed in addition to the previously-described sh1 and sus1 genes.
In summary, while the EST approach provides an economic strategy for
the isolation of clones for genes identified in other species, it is not
without complications. Given the large number of ESTs in our database and
the portion which seem related but probably not identical to the described
genes, one is confused as to just how many genes corn may actually possess?
It is possible to calculate a fair estimate for the number of genes in
Arabidopsis
thaliana, from the total genome size divided by the average gene size,
which seems close to the average number of genes per kilobase being found
by actual genome sequencing. At this time, such a calculation in corn is
meaningless given the very large genome size, primarily resulting from
the amount of intervening retrotransposon-like sequences in corn. One has
to wonder if corn will in fact be limited to that same general estimate
even with its duplicate genome structure, given the large number of apparent
homologs or related gene family members which we have already identified?
It may be that corn not only has enlarged its genome by both the amplification
of intervening sequences and polyploidization, but that it additionally
may have created many diverged copies of genes with similar functions.
Further genome sequencing and evaluation of gene function will be required
to better understand this apparent conundrum.
Return to the MNL 72 On-Line Index
Return to the Maize Newsletter Index
Return to the MaizeGDB Homepage