shape. Although the regression-equations obtained would not
accurately fit the original material, yet they would have a certain
correspondence thereto. What sort of correspondence may be
illustrated by an example in games of chance, which Professor
Weldon kindly supplied. Three half-dozen of dice having been
thrown, the number of dice with
more than three points in that dozen
which is made up of the first and
the second half-dozen is taken for *y*,
the number of sixes in the dozen
made up to the first and the third
half-dozen, is taken for *x*.

2 | |||||||||||||||||||||||||||

1 |
| ||||||||||||||||||||||||||

0 | 1 | 2 |

Fig. 13.

Thus
each twofold observation (*xy*) is the
sum of six twofold elements, each of
which is subject to a law of frequency
represented in fig. 13; where^{[1]}
the figures outside denote the
number of successes of each kind, for the
ordinate the number of dice with
more than three points (out of a cast
of two dice), for the co-ordinate the
number of sixes (out of a cast of two dice, one of which is common
to the aforesaid cast); and the figures inside denote the comparative
probabilities of each twofold value (*e.g.* the probability of obtaining
in the first two cast dice each with more than three points, and
in the second cast two sixes, is 1/72). Treating this law of frequency
according to the rule which is proper to the normal law,
we have (for the element) if the sides of the compartments each = *i*

; ; .

Whence for the regression-equation which gives the value of the
ordinate most probably associated with an assigned value of the
abscissa we have *y* = *x* × rσ_{2}/σ_{1} = 0.3*x*; and for the other regression-equation,
*x* = *y*/6.

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |

12 | 1 | ||||||||||||

11 | 4 | 3 | 3 | 3 | 1 | ||||||||

10 | 3 | 17 | 15 | 13 | 10 | 4 | 3 | 1 | |||||

9 | 12 | 51 | 59 | 61 | 36 | 14 | 5 | 3 | |||||

8 | 36 | 135 | 154 | 150 | 64 | 21 | 5 | 2 | 1 | ||||

7 | 74 | 195 | 260 | 179 | 112 | 35 | 5 | 1 | |||||

6 | 90 | 248 | 254 | 170 | 75 | 26 | 3 | ||||||

5 | 93 | 220 | 230 | 124 | 51 | 8 | 2 | ||||||

4 | 86 | 162 | 127 | 75 | 19 | 4 | 1 | ||||||

3 | 37 | 86 | 56 | 17 | 6 | 2 | |||||||

2 | 14 | 23 | 23 | 4 | 3 | ||||||||

1 | 2 | 4 | |||||||||||

0 |

Accordingly, in Professor Weldon's statistics,
which are reproduced in the annexed diagram, when *x* = 3 the
most probable value of *y* ought to be 1. And in fact this expectation
is verified, *x* and *y* being measured along lines drawn through the
centre of the compartment, which *ought* to have the maximum of
content, representing the concurrence of one dozen with *two* sixes
and another dozen with *six* dice having each more than three points,
the compartment which in fact contains 254 (*almost* the maximum
content). In the absence of observations at *x* = −3*i* or *y* = ±6*i*,
the regression-equations cannot be further verified. At least they
have begun to be verified by batches composed of six elements,
whereas they are not verifiable at all for the simple elements. The
normal formula describes the given statistics as they behave, not
when by themselves, but when massed in crowds; the regression-equation
does not tell us that if *x*′ is the magnitude of one member
the most probable magnitude of the other member associated therewith
is r*x*′, but that if *x*′ is the average of several samples of the first
member, then r*x*′ is the most probable average for the specimens
of the other member associated with those samples. Mr Yule's
proposal to construct regression-equations according to the normal
rule “without troubling to investigate the normality of the distribution”^{[2]}
admits of this among other explanations.”^{[3]} Mr Yules
own view of the subject is well worthy of attention.

154. In the determination of the standard-deviation proper to the
law of error (and other constants proper to other laws of frequency)
Sheppard's Corrections.
it commonly happens that besides the inaccuracy,
which has been estimated, due to the paucity of the
data, there is an inaccuracy due to their *discrete* character:
the circumstance that measurement, *e.g.* of human heights, are
given in comparatively large units, *e.g.* inches, while the real objects
are more perfectly graduated. Mr Sheppard has prescribed a remedy
for this imperfection. For the standard deviation let μ_{2} be the
rough value obtained on the supposition that the observations
are massed at intervals of unit length (not spread out continuously,
as ideal measurements would be); then the proper value, the mean
*integral* of deviation squared, say (μ_{2}) = μ_{2} − 112*h*^{2}, where *h* is the size
of a unit, *e.g.* an inch. It is not to be objected to this correction
that it becomes nugatory when it is less than the probable error to
which the measurement is liable on account of the paucity of observations.
For, as the correction is always in one direction, that of
subtraction, it tends in the long run to be advantageous even though
masked in particular instances by larger fluctuating errors.^{[4]}

155. Professor Pearson has given a beautiful application of the
theory of correlation to test the empirical evidence that a given
Pearson's Criterion of Empirical Verification.
group conforms to a proposed formula, *e.g.* the normal
law of error.^{[5]}

Supposing the constants of the proposed function to
be known—in the case of the normal law the arithmetic
mean and modulus—we could determine the
position of any percentile, *e.g.* the median, say *a*. Now the probability
that if any sample numbering *n* were taken at random
from the complete group, the median of the sample, *a*′, would lie at
such a distance from a that there should be *r* observations between
*a* and *a*′ is

.^{[6]}

If, then, any observed set has an excess which makes the above
written integral very small, the set has probably not been formed
by a random selection from the supposed given complete group.
To extend this method to the case of two, or generally *n*, percentiles,
forming (*n* + 1) compartments, it must be observed that the excesses
say e and e′, are not independent but correlated. To measure the
probability of obtaining a pair of excesses respectively as large as
e and e′, we have now (corresponding to the extremity of the probability-curve
in the simple case) the solid content of a certain
probability-surface outside the curve of equal probability which
passes through the points on the plane *xy* assigned by e, e′ (and the
other data). This double, or in general multiple, integral, say P, is
expressed by Professor Pearson with great elegance in terms of
the quadratic factor, called by him χ^{2}, which forms the exponent of
the expression for the probability that a particular system of the
values of the correlated e, e′, &c., should concur—

when *n* is odd; with an expression different in form, but nearly
coincident in result, when *n* is even. The practical rule derived
from this general theorem may thus be stated. Find from the given
observations the probable values of the coefficients pertaining to
the formula which is supposed to represent the observations.
Calculate from the coefficients a certain number, say *n*, of percentiles;
thereby dividing the given set into *n* + 1 sections, any of which,
according to calculation, ought to contain say *m* of the observations,
while in fact it contains *m*′. Put e for *m*′ − *m*; then χ^{2} = ∑*e*^{2}/*m*.
Professor Pearson has given in an appended table the values of P
corresponding to values of *n* + 1 up to 20, and values of χ^{2} up to 70.
He does not conceal that there is some laxity involved in the circumstance
that the coefficients employed are not known exactly, only
inferred with probability.^{[7]}

156. Here is one of Professor Pearson's illustrations. The table
on next page gives the distribution of 1000 shots fired at a line in a
target, the hits being arranged in belts drawn on the target parallel
to the line. The “normal distribution” is obtained from a
normal curve, of which the coefficients are determined from the
observations. From the value of χ^{2}, viz. 45.8, and of (*n* + 1),
viz. 11, we deduce, with sufficient accuracy from Professor Pearson's
table, or more exactly from the formula on which the table is based,
that P = .000,001,5 · ·. “In other words, if shots are distributed
on a target according to the normal law, then such a distribution
as that cited could only be expected to occur on an average some
15 or 16 times in 10,000,000 times.”

157. “Such a distribution” in this argument must be interpreted The Criterion Criticized. as a distribution for which it is claimed that the observations are all independent of each other. Suppose that there were only 500 independent observations, the remainder being merely duplicates of these 500. Then in the above

- ↑ Cf. above, par. 115.
- ↑
*Proc. Roy. Soc.*, vol. 60, p. 477. - ↑ Below, par. 168.
- ↑ Just as the removal of a tax tends to be in the long run beneficial to the consumer, though the benefit on any particular occasion may be masked by fluctuations of price due to other causes.
- ↑
*Phil. Mag.*(July, 1900). - ↑ As shown above, par. 103.
- ↑
*Loc. cit.*p. 166.