Mendelian
interpretation of offspring-parent regressions.
Dr. K. Mather on his recent visit to this country
discussed some extensions of methods proposed by Fisher, Immer and Tedin,
(Genetic 1932), for estimation of dominance bias in quantitative inheritance.
My own attack in the last News Letter is also an
extension of the same. My approach seems to have some advantages from employing
highly inbred or homozygous parents. Uncertainty on linkage effects is largely
eliminated. Dominance does not reduce correlation between phenotypes of
homozygous parents and the gametes they produce, I have found no particular
advantage in requiring equal frequency of a and A alleles by confining study to
populations which stem from a single selfed heterozygote in each case. Samples
of homozygous lines, selected or otherwise, seem to be satisfactory. If all of
this be true the method must have a wide utility and may be presented again
from more of a Mendelian and less of a mathematical viewpoint.
If the heterozygote aAbBcCdD is crossed to the
multiple recessive tester aabbccdd, testcross progeny may be classified on
kinds and frequencies of four distinct qualitative characters to obtain a
reflected view of dominant alleles in gametes of the heterozygote. This is the
method of classical genetics. It has been seldom noted here that regression of
number of plus characters in testcross progeny on number of dominant alleles in
parent gamete is 1.0. Every plus allele in a gamete provides a plus character
in the zygote, regardless of linkage.
The top dominant AABBCCDD is clearly worthless as a
tester. Offspring-parent regression is zero. Intermediate testers are efficient
in inverse proportion to the number or proportion of loci of AA type. Thus if
testers in general are of aa type at one half of the loci which are
heterozygous in the F1 to be analyzed, a dominant allele in F1
gametes will provide a dominant character in testcross progeny in one half of
the cases. In the other half the dominant character is always provided by the
tester and a dominant allele in the F1 gamete can add nothing more.
Regression is one half. Reduction of regression by dominant genes in the tester
is purely a dominance effect. This dominance effect is reduced one half by
selfing the testcrosses.
It hardly seems necessary to labor with the transfer
of these concepts to the general field of multigenic inheritance where effects
of the several genes combine in a single quantitative measure, and where
dominance is taken into account quantitatively. In the former case, concern is
primarily with frequencies. Basic effects of genes and dominance effects are
both tacitly defined as unity throughout. In the latter case the two effects
must be defined separately and quantitatively. We cannot assume that either is
unity since we are concerned with degree of expression, not with just whether
the character is or is not expressed.
In my attack the array of F1 gametes is
replaced with an array of gametes from an array of homozygous parents. The
purpose is no longer to obtain a reflected picture of the gametic array. That
array is already revealed in the array of homozygous parents. The purpose now
is to estimate regressions of testcross progeny on gamete or homozygous parent
with different testers. If both the bottom recessive and top dominant were
available as testers, decline in regression fron one case to the other would
reveal directly the average degree of dominance. But neither of those two
testers is likely to be available in multigenic cases. We are restricted to a study
of regression relations with such testers as we may be able to develop.
For quantitative definitions of basic gene effects
and dominance effects we may well employ the general scheme of Fisher, et al
(1932) which is essentially that of Fisher in his 1918 paper on correlation
between relatives, and of Mather on his recent visit. If the basic, phenotypic
effect of substituting A for a is "d", phenotypes of aa, aA, AA are
0, d, 2d. The heterozygote is strictly intermediate. But if there is in addition
an interaction of a with A to provide also a dominance effect "kd",
the phenotypes are 0, d+kd, 2d. These quantities are deviations from a working
origin at aa. Deviation of the heterozygote from strict intermediacy is kd, (h
in the notation of Fisher, et al).
For a multiple set of genes a1A1,
a2A2 --- anAn, we may as well let d
and kd be average values for the several loci. Then if gene action is additive
each genotype is evaluated (estimated) by summing the several d's and kd�s. The
simplest case is n = 2. The checkerboard frame is
|
4d |
A1A2 |
2d 2kd |
3d kd |
3d kd |
4d 0 |
P2 |
|
|
||||
|
a1A2 |
d kd |
2d 2kd |
2d 0 |
3d kd |
|
|
2d |
|
||||
|
A1a2 |
d kd |
2d 0 |
2d 2kd |
3d kd |
|
|
|
|
||||
|
0 |
a1a2 |
0 0 |
0 kd |
d kd |
2d 2kd |
|
|
|
||||
|
|
|
a1a2 |
A1a2 |
a1A2 |
A1A2 |
|
|
|
0 |
2d |
4d |
|
|
|
|
|
P1 |
Table 1
Phenotypes of the 3 parent classes are written on
the margins along with the gametes of each class. Phenotypes alone are written
in interior cells for offspring. It may be desirable in teaching to write
genotypes also in the cells and to evaluate some of them by counting a d for
each A allele and a kd for each aA locus or each interaction of unlike allees.
It may also be desirable to write genotypes of parents and evaluate them,
noting absence of dominance effects.
Table 1 is a simple regression surface. Our avowed
purpose is to study the effect of k on the shape of the surface that we may
interpret shapes of data surfaces in terms of k, average degree of dominance.
In practice the homozygotes a1a1
A2A2 and A1A1 a2a2
are ordinarily indistinguishable. This means that the two center columns and
two center rows of table 1 may as well be pooled to conform with the situation
of data on a quantitative character. Pooling provides,
|
4d |
2d 2kd |
3d kd |
4d 0 |
|
P2 |
|||||
2d |
d kd |
2d kd |
3d kd |
||
|
|
||||
|
0 |
0 0 |
d kd |
2d 2kd |
|
|
|
||||
|
|
0 |
2d |
4d |
|
|
|
|
P1 |
|
Table 2
Note that the entry in the central cell, e.g., of
table 2 is the mean of the four central cells of table 1. It is the predicted
(average) result of crosses of homozygotes of the types indicated on the
margins. Deviations of the four crosses from the mean are deviations from
regression due entirely to dominance, to variationo in degree of
heterozygosity, specific combining ability. These variations are not
predictable from data on the parents. The teacher should write frequency
distributions of individual crosses in each cell of table 2 along with the
means given here.
Note further that, while tables 1 and 2 represent
two-factor checkerboards of classical genetics with gametes of F1
recorded on the margins and F2 phenotypes in interior cells, the
view here is arrays of homozygous lines on the margins with F1
phenotypes of crosses of such lines in cells of the tables. Subsequently,
interior values will be referred to as F1s in agreement with modern
corn breeding practice. The two situations are strictly analogous only when a
and A are equally frequent in the sample of homozygous parents.
If table 2 is expanded to include many loci, parent
values are 0, 2d, 4d, - - - - 2nd. A statement of the mean F1 of any
cell in terms of parent values would be the general regression function of F1
on P1 and and P2. The solution of this problem was given
in the previous News Letter. The mean of any cell in a table of the type of
table 2, may be calculated by solving a smaller checkerboard. Detailed arrays
of gametes of the two parent types are written on the margins. But this is
merely taking the product of two gametic arrays, a fundamental principle of
Mendelism. Hence, if u and w are the proportions of loci AA in P1
and P2 respectively, gametic arrays are represented in general by
(1-u)a + uA and (1-w)a + wA. In all of the crosses of P1 type
parents x P2 type parents together, expectations are (1-u)(1-w)aa,
[u(1-w) + w(1-u)] aA, uwAA. The sum of these three proportions, each
multiplied by n and by the respective phenotypes 0, d+kd, 2d, is the expected
increment of mean F1 over the multiple recessive T. Making the
substitutions u = (P1-T)/2nd and w = (P2-T)/2nd provides
the desired function.
The concept u = (P1-T)/2nd might be
presented effectively to a class by laying off an arbitrary scale to reprepent
the range of phenotype from
P1
____________l________________________
T
(2nd +T)
bottom recessive to top
dominant. The scheme is to count 2d for each locus AA as the increment above T,
hence, 2nd where all n loci are AA. The position of any homozygote P1
on this scale reveals directly the proportion of loci AA in P1, u =
(P1-T)/2nd.
The purpose of T is to adjust for the possibility
that the phenotype of the bottom recessive is not zero on the data scale.
It is instructive to verify from table 2 results
reported last year. The left column may represent a series of hybrids having a
common parent P1, the tester, which is aa at each locus. Lines being
tested are represented on the parallel margin as different values of the
variable P2. It is clear that if the tester is completely recessive,
every substitution of AA for aa in P2 will provide a substitution of
aA for aa in F1. Regression of F1 on P2 is
(aA-aa)/(AA-aa) or (one basic gene effect plus one dominance effect)/(two basic
effects) or (1+k)/2. Note that the increment from one cell to the next, left
column of table 2, is d+kd and that the corresponding increment in the P2
column is 2d. The ratio is (1+k)/2. When P1 is aa throughout P1-T
= 0. Substitute in last year's formula for bp to obtain bp = (1+k)/2, if P1-T
= 0.
Similarly from the right column of table 2, bp =
(1-k-)/2, when F1 is AA throughout, (P1-T) = 2nd.
Expansion of table 2 to include many loci will not provide different results.
If, as in most actual cases, some proportion u of
the loci of P1 is AA and 1-u is aa, the weighted mean increment of F1
is [n(1-u)(d+kd) + nu(d-kd)] /n. Or the weighted mean of slopes is (1-u)(1+k)/2
+ u(1-k)/2 = (1+k)/2 -uk. Substituting u = (F1-T)/2nd, bp = (1+k)/2
- (k/2nd)(P1-T).
If bp is (1+k)/2 in the left column of table 2 and
(1-k)/2 in the right column the increment of bp across the table is [(1-k) -
(1+k)]/2 = -k. The concurrent increment of u is 1, and of P1 it is 2nd.
Regression of bp on u is -k and on P1 it is -k/2nd, as the
formula bp = (1+k)/2 - (k/2nd)(P1-T) expressly states.
Thus, the values reported last year may be verified
and their interpretations clarified by direct inspection of table 2.
If it is not immediately obvious that the regression
estimates are unaffected by linkage and by relative frequencies of a and A
alleles, except as noted, the student may need to work out some specific
examples with numerical values assigned to d, kd, q, and per cent crossover and
calculate regressions by machine formulas as vell as by direct substitution in
present formulas.
It is also clear that bp for the midcolumn or midrow
of table 2 is one half, and that mean bp for all three columns or all three
rows is one half. This latter case of mean bp for the whole table is the one
usually calculated for regression of offspring on one parent. If a and A
alleles are equally frequent, frequencies of the three columns are expected
in the ratio 1:2:1 and dominance effects on regression are effectively
cancelled. Note that bp is always one half if k = 0. But if a alleles are in
the minority, the frequency of the right column will be greater than that of
the left column and expectation is that dominance will depress mean partial regression
below one half. This seems to be an adequate explanation of low regressions of
yields of corn hybrids on yields of inbred parents. No alternative explanations
of higher order interactions of genes or of inefficient plot technic appear to
be necessary.
The function, F1 = b1aP1
+ b1bP2 + b2P1P2 + C may
be fitted to data on samples of homozygous parents and the several F1
crosses, or F2 by selfing F1. For F1 data,
estimates of b1 are estimates of (1+k+kT/nd)/2, on the assumption of
additive gene action. Estimates of b2 are estimates of -k/2nd.
Regression of bp on P1 or on P2 is the same estimate of
-k/2nd.
As indicated last year, the general regression
function may be solved to obtain estimates of bottom recessive, top dominant,
and average degree of dominance. Fron the regression of bp on P1,
the estimate of P1 for bp = 0 may be obtained. This is the critical
value of P1. Such a tester combines equally well with poor, medium
and good lines on the average. Better testers may be expected to combine better
with low lines than with high lines, bp is negative.
The several estimates reported last year are in all
respects surprisingly consistent witil the hypothesis of overdominance in vigor
of corn. Tests of significance of b2 reported last year are apparently in error. The
appropriate test is for significance of departure from linear regression
(Snedecor 14.3). By this test no single estimate of b2 is
significantly different from zero which may mean merely that numbers are too
small. The crucial point for overdominonce is whether k is significantly
greater than 1. An additional set of data from C. M. Woodworth, Oren Bolin and
Earl R. Leng of the Illinois Experiment Station gives essentially the same
picture. The critical value of P1 is 4.4 bu./A. Yields of inbred
parents range from 2 to 40. Mean yield of F1s is 103.
We have then one more set of data consistent with
the others in supporting the conclusion that the more vigorous inbred lines in
hand are worthless or worse as testers for general combining ability, since bp
is zero or negative with such lines as testers.
That the few sets of data are not crucial for
overdominance is not surprising. They would not be crucial even if the test for
k greater than 1 showed high significance in each case. So few cases of
monogenic inheritance and linkage would not prove the chromosome theory of
heredity. When many more sets of data on different types of characters in both
cross- and self-fertilized species have been analyzed we may have a clearer
picture of where and to what extent dominance bias occurs. But even then the
results can hardly be conclusive and we will probably still need to be content
with theories which agree best with the whole body of evidence.
There is a suggestion in corn yield data that the
relative order of rank within either a group of inbred lines or within a group
of hybrids may be quite different in two different enviroments. Further, the
shape of the fitted regression surface may also vary greatly in response to
environmental effects. If alleles A' and A perform different functions in the
sense of East, A'A' may be usually inferior but sometimes superior to AA. The
heterozygote A'A if better buffered to environmental shifts may be on the
average superior to either homozygote. In these events, A will probably be the
more frequent and also the dominant favorable in the usual environment. But the
possibility exists that in some environments A' will be the dominant favorable,
with dominance still in the direction of greater vigor. The dominant favorable
A' will be in low frequency. The ratio k of an average dominance effect to an
average basic effect may be changed and with it the equilibrium gene frequency
ratio. All of these shifts will be likely to appear in the regression analyses
for a given sample of stable lines and F1s in different
environments.
Fred H. Hull
Addendum.
Since the above report was typed I have received
from Dr. Paul H. Harvey yield records on 12 lines and the 66 F1s and
have now completed the first part of the analysis yields of lines (selfed four
times) ranged from 12 to 24 bu./A. Mean F1 is 46. The critical value
of P is 25, one bushel above the top line. These data seem to agree with the
other sets and the conclusions drawn from them in all respects.
These last results have given me sufficient
confidence to propose a further attack for which a considerable body of
data is now available, - data
on F1s but not on the parent lines. Mean
F1 for any column of table 2 may be considered a measure of the
general combining ability G of the constant parent for that column. It is
easily demonstrated that G is a linear function of P. Hence, we may as well
estimate the G value of a tester which provides zero partial regression of F1
on G. Where the several F1s of a group of lines have been tested in
as many as four replications, one half of the replications may be employed to
estimate G values for the lines. The remaining replications may estimate F1s.
Correlation of experimental errors in the two estimates are thus eliminated.
The analysis, as before, is to run the simple regression of each F1
column on the parallel column of G; then to run the simple regression of the
first order regressions on G values of the respective constant parents; and
finally to estimate G for bp = 0. If this critical value of G is within the
range of the data the only direct interpretation I have found is overdominance.
This kind of analysis has been run with the data on Late Yellow Single Crosses from the cooperative tests of the U.S. Department of Agriculture with Ohio, Indiana, Illinois, Kansas, Nebraska and Oklahoma in 1943. Mean G for each line was based on the data of five states for analysis with F1 data of the sixth state in each case. The critical value of G is below the G measure of the top line in three cases and slightly above in two cases. In the sixth case the trend of regression is upward and the data are apparently not consistent with any dominance bias toward high yield. Interstate correlations of G values of the ten lines are mostly positive but not very large. This kind of analysis is apparently of some worth where such data are available but it would seem that the attack outlined in the preceding paragraph would be more efficient and also applicable to more data.
Fred H. Hull