Mendelian interpretation of offspring-parent regressions

Mendelian interpretation of offspring-parent regressions.

Dr. K. Mather on his recent visit to this country discussed some extensions of methods proposed by Fisher, Immer and Tedin, (Genetic 1932), for estimation of dominance bias in quantitative inheritance.

My own attack in the last News Letter is also an extension of the same. My approach seems to have some advantages from employing highly inbred or homozygous parents. Uncertainty on linkage effects is largely eliminated. Dominance does not reduce correlation between phenotypes of homozygous parents and the gametes they produce, I have found no particular advantage in requiring equal frequency of a and A alleles by confining study to populations which stem from a single selfed heterozygote in each case. Samples of homozygous lines, selected or otherwise, seem to be satisfactory. If all of this be true the method must have a wide utility and may be presented again from more of a Mendelian and less of a mathematical viewpoint.

If the heterozygote aAbBcCdD is crossed to the multiple recessive tester aabbccdd, testcross progeny may be classified on kinds and frequencies of four distinct qualitative characters to obtain a reflected view of dominant alleles in gametes of the heterozygote. This is the method of classical genetics. It has been seldom noted here that regression of number of plus characters in testcross progeny on number of dominant alleles in parent gamete is 1.0. Every plus allele in a gamete provides a plus character in the zygote, regardless of linkage.

The top dominant AABBCCDD is clearly worthless as a tester. Offspring-parent regression is zero. Intermediate testers are efficient in inverse proportion to the number or proportion of loci of AA type. Thus if testers in general are of aa type at one half of the loci which are heterozygous in the F₁ to be analyzed, a dominant allele in F₁ gametes will provide a dominant character in testcross progeny in one half of the cases. In the other half the dominant character is always provided by the tester and a dominant allele in the F₁ gamete can add nothing more. Regression is one half. Reduction of regression by dominant genes in the tester is purely a dominance effect. This dominance effect is reduced one half by selfing the testcrosses.

It hardly seems necessary to labor with the transfer of these concepts to the general field of multigenic inheritance where effects of the several genes combine in a single quantitative measure, and where dominance is taken into account quantitatively. In the former case, concern is primarily with frequencies. Basic effects of genes and dominance effects are both tacitly defined as unity throughout. In the latter case the two effects must be defined separately and quantitatively. We cannot assume that either is unity since we are concerned with degree of expression, not with just whether the character is or is not expressed.

In my attack the array of F₁ gametes is replaced with an array of gametes from an array of homozygous parents. The purpose is no longer to obtain a reflected picture of the gametic array. That array is already revealed in the array of homozygous parents. The purpose now is to estimate regressions of testcross progeny on gamete or homozygous parent with different testers. If both the bottom recessive and top dominant were available as testers, decline in regression fron one case to the other would reveal directly the average degree of dominance. But neither of those two testers is likely to be available in multigenic cases. We are restricted to a study of regression relations with such testers as we may be able to develop.

For quantitative definitions of basic gene effects and dominance effects we may well employ the general scheme of Fisher, et al (1932) which is essentially that of Fisher in his 1918 paper on correlation between relatives, and of Mather on his recent visit. If the basic, phenotypic effect of substituting A for a is "d", phenotypes of aa, aA, AA are 0, d, 2d. The heterozygote is strictly intermediate. But if there is in addition an interaction of a with A to provide also a dominance effect "kd", the phenotypes are 0, d+kd, 2d. These quantities are deviations from a working origin at aa. Deviation of the heterozygote from strict intermediacy is kd, (h in the notation of Fisher, et al).

For a multiple set of genes a₁A₁, a₂A₂--- a_nA_n, we may as well let d and kd be average values for the several loci. Then if gene action is additive each genotype is evaluated (estimated) by summing the several d's and kd�s. The simplest case is n = 2. The checkerboard frame is

	4d	A₁A₂	2d 2kd	3d kd	3d kd	4d 0
P₂
		a₁A₂	d kd	2d 2kd	2d 0	3d kd
	2d
		A₁a₂	d kd	2d 0	2d 2kd	3d kd

	0	a₁a₂	0 0	0 kd	d kd	2d 2kd

			a₁a₂	A₁a₂	a₁A₂	A₁A₂
			0	2d		4d
				P₁

Table 1

Phenotypes of the 3 parent classes are written on the margins along with the gametes of each class. Phenotypes alone are written in interior cells for offspring. It may be desirable in teaching to write genotypes also in the cells and to evaluate some of them by counting a d for each A allele and a kd for each aA locus or each interaction of unlike allees. It may also be desirable to write genotypes of parents and evaluate them, noting absence of dominance effects.

Table 1 is a simple regression surface. Our avowed purpose is to study the effect of k on the shape of the surface that we may interpret shapes of data surfaces in terms of k, average degree of dominance.

In practice the homozygotes a₁a₁ A₂A₂ and A₁A₁ a₂a₂ are ordinarily indistinguishable. This means that the two center columns and two center rows of table 1 may as well be pooled to conform with the situation of data on a quantitative character. Pooling provides,

	4d	2d 2kd	3d kd	4d 0
P₂
	2d	d kd	2d kd	3d kd

	0	0 0	d kd	2d 2kd

		0	2d	4d
			P₁

Table 2

Note that the entry in the central cell, e.g., of table 2 is the mean of the four central cells of table 1. It is the predicted (average) result of crosses of homozygotes of the types indicated on the margins. Deviations of the four crosses from the mean are deviations from regression due entirely to dominance, to variationo in degree of heterozygosity, specific combining ability. These variations are not predictable from data on the parents. The teacher should write frequency distributions of individual crosses in each cell of table 2 along with the means given here.

Note further that, while tables 1 and 2 represent two-factor checkerboards of classical genetics with gametes of F₁ recorded on the margins and F₂ phenotypes in interior cells, the view here is arrays of homozygous lines on the margins with F₁ phenotypes of crosses of such lines in cells of the tables. Subsequently, interior values will be referred to as F₁s in agreement with modern corn breeding practice. The two situations are strictly analogous only when a and A are equally frequent in the sample of homozygous parents.

If table 2 is expanded to include many loci, parent values are 0, 2d, 4d, - - - - 2nd. A statement of the mean F₁ of any cell in terms of parent values would be the general regression function of F₁ on P₁ and and P₂. The solution of this problem was given in the previous News Letter. The mean of any cell in a table of the type of table 2, may be calculated by solving a smaller checkerboard. Detailed arrays of gametes of the two parent types are written on the margins. But this is merely taking the product of two gametic arrays, a fundamental principle of Mendelism. Hence, if u and w are the proportions of loci AA in P₁ and P₂ respectively, gametic arrays are represented in general by (1-u)a + uA and (1-w)a + wA. In all of the crosses of P₁ type parents x P₂ type parents together, expectations are (1-u)(1-w)aa, [u(1-w) + w(1-u)] aA, uwAA. The sum of these three proportions, each multiplied by n and by the respective phenotypes 0, d+kd, 2d, is the expected increment of mean F₁ over the multiple recessive T. Making the substitutions u = (P₁-T)/2nd and w = (P₂-T)/2nd provides the desired function.

The concept u = (P₁-T)/2nd might be presented effectively to a class by laying off an arbitrary scale to reprepent the range of phenotype from

P₁

____________l________________________

T (2^nd+T)

bottom recessive to top dominant. The scheme is to count 2d for each locus AA as the increment above T, hence, 2nd where all n loci are AA. The position of any homozygote P₁ on this scale reveals directly the proportion of loci AA in P₁, u = (P₁-T)/2nd.

The purpose of T is to adjust for the possibility that the phenotype of the bottom recessive is not zero on the data scale.

It is instructive to verify from table 2 results reported last year. The left column may represent a series of hybrids having a common parent P₁, the tester, which is aa at each locus. Lines being tested are represented on the parallel margin as different values of the variable P₂. It is clear that if the tester is completely recessive, every substitution of AA for aa in P₂ will provide a substitution of aA for aa in F₁. Regression of F₁ on P₂ is (aA-aa)/(AA-aa) or (one basic gene effect plus one dominance effect)/(two basic effects) or (1+k)/2. Note that the increment from one cell to the next, left column of table 2, is d+kd and that the corresponding increment in the P₂ column is 2d. The ratio is (1+k)/2. When P₁ is aa throughout P₁-T = 0. Substitute in last year's formula for bp to obtain bp = (1+k)/2, if P₁-T = 0.

Similarly from the right column of table 2, bp = (1-k-)/2, when F₁ is AA throughout, (P₁-T) = 2nd. Expansion of table 2 to include many loci will not provide different results.

If, as in most actual cases, some proportion u of the loci of P₁ is AA and 1-u is aa, the weighted mean increment of F₁ is [n(1-u)(d+kd) + nu(d-kd)] /n. Or the weighted mean of slopes is (1-u)(1+k)/2 + u(1-k)/2 = (1+k)/2 -uk. Substituting u = (F₁-T)/2nd, bp = (1+k)/2 - (k/2nd)(P₁-T).

If bp is (1+k)/2 in the left column of table 2 and (1-k)/2 in the right column the increment of bp across the table is [(1-k) - (1+k)]/2 = -k. The concurrent increment of u is 1, and of P₁ it is 2nd. Regression of bp on u is -k and on P₁ it is -k/2nd, as the formula bp = (1+k)/2 - (k/2nd)(P₁-T) expressly states.

Thus, the values reported last year may be verified and their interpretations clarified by direct inspection of table 2.

If it is not immediately obvious that the regression estimates are unaffected by linkage and by relative frequencies of a and A alleles, except as noted, the student may need to work out some specific examples with numerical values assigned to d, kd, q, and per cent crossover and calculate regressions by machine formulas as vell as by direct substitution in present formulas.

It is also clear that bp for the midcolumn or midrow of table 2 is one half, and that mean bp for all three columns or all three rows is one half. This latter case of mean bp for the whole table is the one usually calculated for regression of offspring on one parent. If a and A alleles are equally frequent, frequencies of the three columns are expected in the ratio 1:2:1 and dominance effects on regression are effectively cancelled. Note that bp is always one half if k = 0. But if a alleles are in the minority, the frequency of the right column will be greater than that of the left column and expectation is that dominance will depress mean partial regression below one half. This seems to be an adequate explanation of low regressions of yields of corn hybrids on yields of inbred parents. No alternative explanations of higher order interactions of genes or of inefficient plot technic appear to be necessary.

The function, F₁ = b_1aP₁ + b_1bP₂ + b₂P₁P₂ + C may be fitted to data on samples of homozygous parents and the several F₁ crosses, or F₂ by selfing F₁. For F₁ data, estimates of b₁ are estimates of (1+k+kT/nd)/2, on the assumption of additive gene action. Estimates of b₂ are estimates of -k/2nd. Regression of bp on P₁ or on P₂ is the same estimate of -k/2nd.

As indicated last year, the general regression function may be solved to obtain estimates of bottom recessive, top dominant, and average degree of dominance. Fron the regression of bp on P₁, the estimate of P₁ for bp = 0 may be obtained. This is the critical value of P₁. Such a tester combines equally well with poor, medium and good lines on the average. Better testers may be expected to combine better with low lines than with high lines, bp is negative.

The several estimates reported last year are in all respects surprisingly consistent witil the hypothesis of overdominance in vigor of corn. Tests of significance of b₂ reported last year are apparently in error. The appropriate test is for significance of departure from linear regression (Snedecor 14.3). By this test no single estimate of b₂ is significantly different from zero which may mean merely that numbers are too small. The crucial point for overdominonce is whether k is significantly greater than 1. An additional set of data from C. M. Woodworth, Oren Bolin and Earl R. Leng of the Illinois Experiment Station gives essentially the same picture. The critical value of P₁ is 4.4 bu./A. Yields of inbred parents range from 2 to 40. Mean yield of F₁s is 103.

We have then one more set of data consistent with the others in supporting the conclusion that the more vigorous inbred lines in hand are worthless or worse as testers for general combining ability, since bp is zero or negative with such lines as testers.

That the few sets of data are not crucial for overdominance is not surprising. They would not be crucial even if the test for k greater than 1 showed high significance in each case. So few cases of monogenic inheritance and linkage would not prove the chromosome theory of heredity. When many more sets of data on different types of characters in both cross- and self-fertilized species have been analyzed we may have a clearer picture of where and to what extent dominance bias occurs. But even then the results can hardly be conclusive and we will probably still need to be content with theories which agree best with the whole body of evidence.

There is a suggestion in corn yield data that the relative order of rank within either a group of inbred lines or within a group of hybrids may be quite different in two different enviroments. Further, the shape of the fitted regression surface may also vary greatly in response to environmental effects. If alleles A' and A perform different functions in the sense of East, A'A' may be usually inferior but sometimes superior to AA. The heterozygote A'A if better buffered to environmental shifts may be on the average superior to either homozygote. In these events, A will probably be the more frequent and also the dominant favorable in the usual environment. But the possibility exists that in some environments A' will be the dominant favorable, with dominance still in the direction of greater vigor. The dominant favorable A' will be in low frequency. The ratio k of an average dominance effect to an average basic effect may be changed and with it the equilibrium gene frequency ratio. All of these shifts will be likely to appear in the regression analyses for a given sample of stable lines and F₁s in different environments.

Fred H. Hull

Addendum.

Since the above report was typed I have received from Dr. Paul H. Harvey yield records on 12 lines and the 66 F₁s and have now completed the first part of the analysis yields of lines (selfed four times) ranged from 12 to 24 bu./A. Mean F₁ is 46. The critical value of P is 25, one bushel above the top line. These data seem to agree with the other sets and the conclusions drawn from them in all respects.

These last results have given me sufficient confidence to propose a further attack for which a considerable body of data is now available, - data

on F₁s but not on the parent lines. Mean F₁ for any column of table 2 may be considered a measure of the general combining ability G of the constant parent for that column. It is easily demonstrated that G is a linear function of P. Hence, we may as well estimate the G value of a tester which provides zero partial regression of F₁ on G. Where the several F₁s of a group of lines have been tested in as many as four replications, one half of the replications may be employed to estimate G values for the lines. The remaining replications may estimate F₁s. Correlation of experimental errors in the two estimates are thus eliminated. The analysis, as before, is to run the simple regression of each F₁ column on the parallel column of G; then to run the simple regression of the first order regressions on G values of the respective constant parents; and finally to estimate G for bp = 0. If this critical value of G is within the range of the data the only direct interpretation I have found is overdominance.

This kind of analysis has been run with the data on Late Yellow Single Crosses from the cooperative tests of the U.S. Department of Agriculture with Ohio, Indiana, Illinois, Kansas, Nebraska and Oklahoma in 1943. Mean G for each line was based on the data of five states for analysis with F₁ data of the sixth state in each case. The critical value of G is below the G measure of the top line in three cases and slightly above in two cases. In the sixth case the trend of regression is upward and the data are apparently not consistent with any dominance bias toward high yield. Interstate correlations of G values of the ten lines are mostly positive but not very large. This kind of analysis is apparently of some worth where such data are available but it would seem that the attack outlined in the preceding paragraph would be more efficient and also applicable to more data.

Fred H. Hull