Mendelian interpretation of offspring-parent regressions.

 

Dr. K. Mather on his recent visit to this country discussed some extensions of methods proposed by Fisher, Immer and Tedin, (Genetic 1932), for estimation of dominance bias in quantitative inheritance.

 

My own attack in the last News Letter is also an extension of the same. My approach seems to have some advantages from employing highly inbred or homozygous parents. Uncertainty on linkage effects is largely eliminated. Dominance does not reduce correlation between phenotypes of homozygous parents and the gametes they produce, I have found no particular advantage in requiring equal frequency of a and A alleles by confining study to populations which stem from a single selfed heterozygote in each case. Samples of homozygous lines, selected or otherwise, seem to be satisfactory. If all of this be true the method must have a wide utility and may be presented again from more of a Mendelian and less of a mathematical viewpoint.

 

If the heterozygote aAbBcCdD is crossed to the multiple recessive tester aabbccdd, testcross progeny may be classified on kinds and frequencies of four distinct qualitative characters to obtain a reflected view of dominant alleles in gametes of the heterozygote. This is the method of classical genetics. It has been seldom noted here that regression of number of plus characters in testcross progeny on number of dominant alleles in parent gamete is 1.0. Every plus allele in a gamete provides a plus character in the zygote, regardless of linkage.

 

The top dominant AABBCCDD is clearly worthless as a tester. Offspring-parent regression is zero. Intermediate testers are efficient in inverse proportion to the number or proportion of loci of AA type. Thus if testers in general are of aa type at one half of the loci which are heterozygous in the F1 to be analyzed, a dominant allele in F1 gametes will provide a dominant character in testcross progeny in one half of the cases. In the other half the dominant character is always provided by the tester and a dominant allele in the F1 gamete can add nothing more. Regression is one half. Reduction of regression by dominant genes in the tester is purely a dominance effect. This dominance effect is reduced one half by selfing the testcrosses.

 

It hardly seems necessary to labor with the transfer of these concepts to the general field of multigenic inheritance where effects of the several genes combine in a single quantitative measure, and where dominance is taken into account quantitatively. In the former case, concern is primarily with frequencies. Basic effects of genes and dominance effects are both tacitly defined as unity throughout. In the latter case the two effects must be defined separately and quantitatively. We cannot assume that either is unity since we are concerned with degree of expression, not with just whether the character is or is not expressed.

 

In my attack the array of F1 gametes is replaced with an array of gametes from an array of homozygous parents. The purpose is no longer to obtain a reflected picture of the gametic array. That array is already revealed in the array of homozygous parents. The purpose now is to estimate regressions of testcross progeny on gamete or homozygous parent with different testers. If both the bottom recessive and top dominant were available as testers, decline in regression fron one case to the other would reveal directly the average degree of dominance. But neither of those two testers is likely to be available in multigenic cases. We are restricted to a study of regression relations with such testers as we may be able to develop.

 

For quantitative definitions of basic gene effects and dominance effects we may well employ the general scheme of Fisher, et al (1932) which is essentially that of Fisher in his 1918 paper on correlation between relatives, and of Mather on his recent visit. If the basic, phenotypic effect of substituting A for a is "d", phenotypes of aa, aA, AA are 0, d, 2d. The heterozygote is strictly intermediate. But if there is in addition an interaction of a with A to provide also a dominance effect "kd", the phenotypes are 0, d+kd, 2d. These quantities are deviations from a working origin at aa. Deviation of the heterozygote from strict intermediacy is kd, (h in the notation of Fisher, et al).

 

For a multiple set of genes a1A1, a2A2 --- anAn, we may as well let d and kd be average values for the several loci. Then if gene action is additive each genotype is evaluated (estimated) by summing the several d's and kd�s. The simplest case is n = 2. The checkerboard frame is

 

 

4d

A1A2

2d

2kd

3d

kd

3d

kd

4d

0

P2

 

 

 

a1A2

d

kd

2d

2kd

2d

0

3d

kd

 

2d

 

 

A1a2

d

kd

2d

0

2d

2kd

3d

kd

 

 

 

 

0

a1a2

0

0

0

kd

d

kd

2d

2kd

 

 

 

 

 

 

a1a2

A1a2

a1A2

A1A2

 

 

 

0

2d

4d

 

 

 

 

P1

 

Table 1

 

Phenotypes of the 3 parent classes are written on the margins along with the gametes of each class. Phenotypes alone are written in interior cells for offspring. It may be desirable in teaching to write genotypes also in the cells and to evaluate some of them by counting a d for each A allele and a kd for each aA locus or each interaction of unlike allees. It may also be desirable to write genotypes of parents and evaluate them, noting absence of dominance effects.

 

Table 1 is a simple regression surface. Our avowed purpose is to study the effect of k on the shape of the surface that we may interpret shapes of data surfaces in terms of k, average degree of dominance.

 

In practice the homozygotes a1a1 A2A2 and A1A1 a2a2 are ordinarily indistinguishable. This means that the two center columns and two center rows of table 1 may as well be pooled to conform with the situation of data on a quantitative character. Pooling provides,

 

 

4d

2d

2kd

3d

kd

4d

0

 

P2

2d

d

kd

2d

kd

3d

kd

 

 

 

0

0

0

d

kd

2d

2kd

 

 

 

 

 

0

2d

4d

 

 

 

 

P1

 

 

Table 2

 

Note that the entry in the central cell, e.g., of table 2 is the mean of the four central cells of table 1. It is the predicted (average) result of crosses of homozygotes of the types indicated on the margins. Deviations of the four crosses from the mean are deviations from regression due entirely to dominance, to variationo in degree of heterozygosity, specific combining ability. These variations are not predictable from data on the parents. The teacher should write frequency distributions of individual crosses in each cell of table 2 along with the means given here.

 

Note further that, while tables 1 and 2 represent two-factor checkerboards of classical genetics with gametes of F1 recorded on the margins and F2 phenotypes in interior cells, the view here is arrays of homozygous lines on the margins with F1 phenotypes of crosses of such lines in cells of the tables. Subsequently, interior values will be referred to as F1s in agreement with modern corn breeding practice. The two situations are strictly analogous only when a and A are equally frequent in the sample of homozygous parents.

 

If table 2 is expanded to include many loci, parent values are 0, 2d, 4d, - - - - 2nd. A statement of the mean F1 of any cell in terms of parent values would be the general regression function of F1 on P1 and and P2. The solution of this problem was given in the previous News Letter. The mean of any cell in a table of the type of table 2, may be calculated by solving a smaller checkerboard. Detailed arrays of gametes of the two parent types are written on the margins. But this is merely taking the product of two gametic arrays, a fundamental principle of Mendelism. Hence, if u and w are the proportions of loci AA in P1 and P2 respectively, gametic arrays are represented in general by (1-u)a + uA and (1-w)a + wA. In all of the crosses of P1 type parents x P2 type parents together, expectations are (1-u)(1-w)aa, [u(1-w) + w(1-u)] aA, uwAA. The sum ­of these three proportions, each multiplied by n and by the respective phenotypes 0, d+kd, 2d, is the expected increment of mean F1 over the multiple recessive T. Making the substitutions u = (P1-T)/2nd and w = (P2-T)/2nd provides the desired function.

 

The concept u = (P1-T)/2nd might be presented effectively to a class by laying off an arbitrary scale to reprepent the range of phenotype from

 

 

           P1

____________l________________________

     T                                 (2nd +T)

 

bottom recessive to top dominant. The scheme is to count 2d for each locus AA as the increment above T, hence, 2nd where all n loci are AA. The position of any homozygote P1 on this scale reveals directly the proportion of loci AA in P1, u = (P1-T)/2nd.

 

The purpose of T is to adjust for the possibility that the phenotype of the bottom recessive is not zero on the data scale.

 

It is instructive to verify from table 2 results reported last year. The left column may represent a series of hybrids having a common parent P1, the tester, which is aa at each locus. Lines being tested are represented on the parallel margin as different values of the variable P2. It is clear that if the tester is completely recessive, every substitution of AA for aa in P2 will provide a substitution of aA for aa in F1. Regression of F1 on P2 is (aA-aa)/(AA-aa) or (one basic gene effect plus one dominance effect)/(two basic effects) or (1+k)/2. Note that the increment from one cell to the next, left column of table 2, is d+kd and that the corresponding increment in the P2 column is 2d. The ratio is (1+k)/2. When P1 is aa throughout P1-T = 0. Substitute in last year's formula for bp to obtain bp = (1+k)/2, if P1-T = 0.

 

Similarly from the right column of table 2, bp = (1-k-)/2, when F1 is AA throughout, (P1-T) = 2nd. Expansion of table 2 to include many loci will not provide different results.

 

If, as in most actual cases, some proportion u of the loci of P1 is AA and 1-u is aa, the weighted mean increment of F1 is [n(1-u)(d+kd) + nu(d-kd)] /n. Or the weighted mean of slopes is (1-u)(1+k)/2 + u(1-k)/2 = (1+k)/2 -uk. Substituting u = (F1-T)/2nd, bp = (1+k)/2 - (k/2nd)(P1-T).

 

If bp is (1+k)/2 in the left column of table 2 and (1-k)/2 in the right column the increment of bp across the table is [(1-k) - (1+k)]/2 = -k. The concurrent increment of u is 1, and of P1 it is 2nd. Re­gression of bp on u is -k and on P1 it is -k/2nd, as the formula bp = (1+k)/2 - (k/2nd)(P1-T) expressly states.

 

Thus, the values reported last year may be verified and their interpretations clarified by direct inspection of table 2.

 

If it is not immediately obvious that the regression estimates are unaffected by linkage and by relative frequencies of a and A alleles, except as noted, the student may need to work out some specific examples with numerical values assigned to d, kd, q, and per cent crossover and calculate regressions by machine formulas as vell as by direct substitution in present formulas.

 

It is also clear that bp for the midcolumn or midrow of table 2 is one half, and that mean bp for all three columns or all three rows is one half. This latter case of mean bp for the whole table is the one usually calculated for regression of offspring on one parent. If a and A alleles are equally frequent, frequencies of the three columns are ex­pected in the ratio 1:2:1 and dominance effects on regression are effectively cancelled. Note that bp is always one half if k = 0. But if a alleles are in the minority, the frequency of the right column will be greater than that of the left column and expectation is that dominance will depress mean partial regression below one half. This seems to be an adequate explanation of low regressions of yields of corn hybrids on yields of inbred parents. No alternative explanations of higher order interactions of genes or of inefficient plot technic appear to be necessary.

 

The function, F1 = b1aP1 + b1bP2 + b2P1P2 + C may be fitted to data on samples of homozygous parents and the several F1 crosses, or F2 by selfing F1. For F1 data, estimates of b1 are estimates of (1+k+kT/nd)/2, on the assumption of additive gene action. Estimates of b2 are estimates of -k/2nd. Regression of bp on P1 or on P2 is the same estimate of -k/2nd.

 

As indicated last year, the general regression function may be solved to obtain estimates of bottom recessive, top dominant, and average degree of dominance. Fron the regression of bp on P1, the estimate of P1 for bp = 0 may be obtained. This is the critical value of P1. Such a tester combines equally well with poor, medium and good lines on the average. Better testers may be expected to combine better with low lines than with high lines, bp is negative.

 

The several estimates reported last year are in all respects surprisingly consistent witil the hypothesis of overdominance in vigor of corn. Tests of significance of b2 reported last   year are apparently in error. The appropriate test is for significance of departure from linear regression (Snedecor 14.3). By this test no single estimate of b2 is significantly different from zero which may mean merely that numbers are too small. The crucial point for overdominonce is whether k is signifi­cantly greater than 1. An additional set of data from C. M. Woodworth, Oren Bolin and Earl R. Leng of the Illinois Experiment Station gives essentially the same picture. The critical value of P1 is 4.4 bu./A. Yields of inbred parents range from 2 to 40. Mean yield of F1s is 103.

 

We have then one more set of data consistent with the others in supporting the conclusion that the more vigorous inbred lines in hand are worthless or worse as testers for general combining ability, since bp is zero or negative with such lines as testers.

 

That the few sets of data are not crucial for overdominance is not surprising. They would not be crucial even if the test for k greater than 1 showed high significance in each case. So few cases of monogenic inheritance and linkage would not prove the chromosome theory of heredity. When many more sets of data on different types of characters in both cross- and self-fertilized species have been analyzed we may have a clearer picture of where and to what extent dominance bias occurs. But even then the results can hardly be conclusive and we will probably still need to be content with theories which agree best with the whole body of evidence.

 

There is a suggestion in corn yield data that the relative order of rank within either a group of inbred lines or within a group of hybrids may be quite different in two different enviroments. Further, the shape of the fitted regression surface may also vary greatly in response to environmental effects. If alleles A' and A perform different functions in the sense of East, A'A' may be usually inferior but sometimes superior to AA. The heterozygote A'A if better buffered to environmental shifts may be on the average superior to either homozygote. In these events, A will probably be the more frequent and also the dominant favorable in the usual environment. But the possibility exists that in some environments A' will be the dominant favorable, with dominance still in the direction of greater vigor. The dominant favorable A' will be in low frequency. The ratio k of an average dominance effect to an average basic effect may be changed and with it the equilibrium gene frequency ratio. All of these shifts will be likely to appear in the regression analyses for a given sample of stable lines and F1s in different environments.

 

Fred H. Hull

 

Addendum.

 

Since the above report was typed I have received from Dr. Paul H. Harvey yield records on 12 lines and the 66 F1s and have now completed the first part of the analysis yields of lines (selfed four times) ranged from 12 to 24 bu./A. Mean F1 is 46. The critical value of P is 25, one bushel above the top line. These data seem to agree with the other sets and the conclusions drawn from them in all respects.

 

These last results have given me sufficient confidence to pro­pose a further attack for which a considerable body of data is now avail­able, - data

on F1s but not on the parent lines. Mean F1 for any column of table 2 may be considered a measure of the general combining ability G of the constant parent for that column. It is easily demonstrated that G is a linear function of P. Hence, we may as well estimate the G value of a tester which provides zero partial regression of F1 on G. Where the several F1s of a group of lines have been tested in as many as four replications, one half of the replications may be employed to estimate G values for the lines. The remaining replications may estimate F1s. Correlation of experimental errors in the two estimates are thus eliminated. The analysis, as before, is to run the simple regression of each F1 column on the parallel column of G; then to run the simple regression of the first order regressions on G values of the respective constant parents; and finally to estimate G for bp = 0. If this critical value of G is within the range of the data the only direct interpretation I have found is over­dominance.

 

This kind of analysis has been run with the data on Late Yellow Single Crosses from the cooperative tests of the U.S. Department of Agriculture with Ohio, Indiana, Illinois, Kansas, Nebraska and Oklahoma in 1943. Mean G for each line was based on the data of five states for analysis with F1 data of the sixth state in each case. The critical value of G is below the G measure of the top line in three cases and slightly above in two cases. In the sixth case the trend of regression is upward and the data are apparently not consistent with any dominance bias toward high yield. Interstate correlations of G values of the ten lines are mostly positive but not very large. This kind of analysis is apparently of some worth where such data are available but it would seem that the attack outlined in the preceding paragraph would be more efficient and also applicable to more data.

 

Fred H. Hull