BOZEMAN, MONTANA

Montana State University

CORVALLIS, OREGON

Oregon State University

Genome mapping with non-inbred crosses using GMendel 2.0

--Craig Echt, Steven Knapp and Ben-Hui Liu

A powerful linkage analysis computer program, GMendel 2.0, has been developed which allows genome maps to be constructed from any type of diploid cross. GMendel 2.0 is unique in that it can perform multipoint linkage analysis on populations with complex genetic structures, such as those arising from an F1, F2 or backcross between highly heterozygous parents, as well as from more traditional mapping crosses, such as from an F2 from inbred parents. The general applicability of the program also allows for mapping of dominant genetic markers, such as those associated with AFLPs (amplified fragment length polymorphisms, or RAPDs), null-allele RFLPs and dominant morphological phenotypes. By way of example we describe below the construction of a linkage map of diploid alfalfa, but genome maps could be constructed from any population of similar genetic complexity.

In collaboration with Tom McCoy, T at Montana State U. and Tom Osborn, T and Kim Kidwell, K at the U. of Wisconsin we have constructed the first genetic linkage map of Medicago sativa using GMendel 2.0 (manuscript in preparation). The current map incorporates 57 RFLP and 128 RAPD markers into 8 linkage groups, corresponding to the 8 chromosomes of alfalfa, and has a total length of 1,321 recombination units. The 88 progeny we used for linkage analysis were from a backcross between diploid clones derived from cultivated alfalfa, an outcrossing tetraploid. The use of a backcross from highly heterozygous parents was necessary because cultivated alfalfa is very susceptible to inbreeding depression and inbred lines are not generally available.

A number of different segregation types can arise, depending on the type of cross, when parents heterozygous at many loci are used to generate a population for linkage analysis. Figure 1 lists all the possible parental phenotypes and their segregation genotypes generated from F1s, F2s and backcrosses. Loci in an F2 population, whether from inbred (homozygous) or heterozygous parents, have a maximum of two alleles which can segregate either as 1:2:1 or 1:3 (for dominant markers). For a backcross population from inbred parents, loci can have two alleles which segregate either as 1:1 or 1:3, while if from heterozygous parents, loci can have up to three alleles which can segregate either as 1:1, 1:3, 1:2:1, 1:1:1:1 or 2:1:1. In an F1 population from heterozygous parents there can be up to 4 alleles per locus but the segregation classes are the same as those found from a backcross. There is, of course, no segregation expected in an F1 population from inbred parents.

Figure 1. All possible parental DNA marker phenotypes and general segregation genotypes for single loci arising from crosses using heterozygous parents. All cases are possible from an F1 cross. The only segregation types possible from an F2 cross (self) are marked ì*î. Those segregation types possible from a backcross are marked both ì*î and ì**î. In the case of a backcross P1 refers to the F1 and P2 to the recurrent backcross parent. Alleles of a locus are designated a, b, c, d or 0 (null). The segregation and progeny codes are used by the current version GMendel 2.0 for scoring parental and progeny phenotypes in the mapping database matrix. Future versions of the program will have separate codings for 1:1:1:1 and 2:1:1 segregants to allow for even greater precision of recombination and ordering estimates.

Any program of general applicability which seeks to create a comprehensive linkage map from heterozygous crosses must be able to generate two-point recombination estimates from all possible matings between multiple segregation types, properly infer linkage phases and correctly order each locus with respect to all others. The main problems in accomplishing this arise from the presence of multiple alleles and multiple segregation classes, and from the inability to know linkage phases a priori due to the lack of information about the genetic structure of the parents.

Programs such as Mapmaker (Lander et al., Genomics 1:174, 1987) are designed to analyze segregation data from inbred parents only. As pointed out by Ritter et al. (Genetics 135:645, 1990), Mapmaker cannot integrate data from multiple segregation classes into one map. Ritter et al. have described a method which generates two point estimates and linkage subgroups from populations with multiple segregation types but ordering is accomplished by an ad hoc method that is not true multipoint mapping. This method was used in developing an RFLP map of diploid potato from a backcross population using heterozygous parents (Gebhardt et al., TAG 78:65, 1989).

By the use of the appropriate segregation class codes and progeny phenotype codes (Fig. 1) GMendel 2.0 generates two point maximum likelihood estimates for all pairwise matings between all loci. Linkage phases are correctly assigned based on probability rules and gene order is estimated using an advance multipoint mapping algorithm. Missing progeny data are neither estimated nor substituted and are simply excluded from the two-point estimates. As can be noted from Figure 1, the current version of GMendel ignores multiple alleles and classifies loci as segregating either 1:1, 1:2:1, or 1:3. Loci segregating 1:1:1:1 are scored as 1:2:1 segregants and loci segregating 2:1:1 are scored as 1:1 segregants (see Figure 1). Although limiting scoring to only two alleles does not take full advantage of all the genetic information present in multiallelic segregants, it does retain all the information available from two alleles and does not compromise the accuracy of the map. This approach was made necessary by program development constraints and will be altered in future versions of the program to allow for complete genetic classification of the progeny.

No a priori knowledge of linkage phase is needed since GMendel 2.0 uses simple probability rules to infer whether two loci are linked in coupling or repulsion. When the LOD score for two loci is greater than 3.0 and when a maximum likelihood estimate for coupling is used, then a recombination estimate of less than 0.30 indicates the loci are in coupling, while an estimate greater than 0.70 indicates that the loci are in repulsion. (LOD means the log of the odds, the ìoddsî being the ratio of the probability that two loci are linked with a given recombination value over the probability that the two are not linked. A LOD over 3.0 means that the chances are greater that 1000:1 that the loci are linked for a given recombination estimate.) A correct repulsion recombination estimate is obtained by subtracting from 1 any coupling estimate over 0.70 (with a LOD > 3.0). This method of determining linkage phases is 100% accurate when the probability rules are met and does not require knowledge of the genotypes of the parents. Loci having recombination estimates above 0.30 are assigned to unlinked loci or to two-loci linkage groups.

Multipoint gene order is determined by GMendel 2.0 using a powerful method called the simulated annealing algorithm (SAA). The details of SAA will be presented elsewhere but, in brief, it estimates the shortest linear map, the global minimum, by simulating different gene orders for groups of loci in a progressive manner and saving only the shortest orders. A number of different ordering criteria are used in estimating minimum distances. The two main ordering criteria are the sum of adjacent recombination frequencies matrix method and the sum of adjacent 2-point LOD scores matrix method. The SAA can obtain gene orders even when non-informative matings exist between some of the loci. An example of a non-informative mating is when two loci, A and B, are each segregating 1:1 among the progeny but locus A is homozygous in one parent and locus B is homozygous in the other parent. Recombinant progeny cannot be detected from such a mating. Such matings are devoid of ordering information but gene order can be inferred from the relative order and distances of adjacent loci which do give informative matings.

The current version of GMendel 2.0 does not incorporate maximum likelihood equations for estimating recombination for loci exhibiting differential viability. However, skewed segregation ratios do not seem to have a significant effect on the recombination estimates or gene order as long as the ratios were not profoundly skewed. This was evident from our alfalfa map where 45% of the loci for which we obtained segregation data did not fall within Mendelian expectations at the p < 0.05 level. The high level of segregation distortion present in our mapping population results in large part from a high genetic load and can be expected within most open-pollinated or wide-cross populations. Future versions of the GMendel 2.0 will utilize maximum likelihood recombination estimates which will minimize the effects of distorted segregation ratios.

GMendel 2.0 runs under a UNIX operating system and requires a Fortran compiler. For information on obtaining a copy of the program write to Steven Knapp, Dept. of Crop and Soil Science, Oregon State Univ., Corvallis, OR 97331-3002 or send e-mail to [email protected].


Please Note: Notes submitted to the Maize Genetics Cooperation Newsletter may be cited only with consent of the authors

Return to the MNL 66 On-Line Index
Return to the Maize Newsletter Index
Return to the MaizeGDB Homepage