if not known beforehand, may be inferred, as in the simpler case,
from a set of observations. Similar statements holding for the
other equations, the probability that the given set of observations
*f*_{1}, *f*_{2}, &c., should have resulted from a particular system of values
for *x*, *y* . . is J exp [(*a*_{1}*x* + *b*_{1}*y* − *f*_{1})^{2}/*c*_{1}^{2} + (*a*_{2}*x* + *b*_{2}*y* − *f*_{2})^{2}/*c*_{2}^{2} + &c.],
where J is a-constant determined on the same principle as in the
analogous simpler cases.^{[1]} The condition that P should be a
maximum gives as many linear equations for the determination
of *x*′ *y*′ . . . as there are unknown quantities.

145. The solution proper to the case where the observations are
known to arrange according to the normal law may be extended to
numerous observations ranging under any law, on the principles
which justify the use of the Method of Least Squares in the case of
a single *quaesitum*.

146. As in that simple case, the principle of economy will now
justify the use of the *median*, *e.g.* in the case of two *quaesita*, putting
for the true values of *x* and *y* that point for which the sum of the
perpendiculars let fall from it on each of a set of lines representing
the given equations (properly weighted) is a minimum.^{[2]}

147. The older writers have expressed the error in the determination
of one of the variables without reference to the error in the
Normal Correlation.
other. But the error of one variable may be regarded
as *correlated* with that of another; that is, if the system
*x*′, *y*′ . . . forms the solution of the given equations,
while *x*′ + ξ, *x*′ + η . . . is the real system, the (small) values of
ξ, η. . . . which will concur in the long run of systems from which the
given set of observations result are normally correlated. From
this point of view Bravais, in 1846, was led to several theorems
which are applicable to the now more important case of correlation
in which ξ and η are given (not in general small) deviations from
the means of two or more correlated members (organs or attributes)
forming a normal group.

148. To determine the frequency-constants of such a group it is
proper to proceed on the analogy of the simple case of one-dimensioned
error. In the case of two dimensions, for instance, the
probability p_{1} that a given pair of observations (*x*_{1}, *y*_{1}) should
have resulted from a normal group of which the means are *x*′ *y*′
respectively, the standard deviations σ_{1} and σ_{2} and the coefficient of
correlation r, may be written—

∆*x*∆*y*∆σ_{1}∆σ_{2}∆r(1/2π),

where E^{2} = (*x*′ − *x*_{1})^{2}/σ_{1}^{2} − 2r(*x*′ − *x*_{1})(*y*′ − *y*_{1})/σ_{1}σ_{2} + (*y*′ − *y*_{1})^{2}/σ_{2}^{2}.
A similar statement holds for each other pair of observations
(*x*_{2}*y*_{2}), (*x*_{3}*y*_{3}). . .; with analogous expressions for *p*_{2}, *p*_{3}. . . Whence,
as in the simpler case, we have *p*_{1} × *p*_{2} × &c. × *p*_{n}/J (a constant)
for P, the a posteriori probability that the given observations should
have resulted from an assigned system of the frequency-constants.
The most probable system is determined by making P a maximum,
and accordingly equating to zero each of the following expressions—

*d*P*dx* *d*P*dy* *d*P*d*σ_{1} *d*P*d*σ_{2} *d*P*d*r.

The values of the arithmetic mean and of the standard deviation
for each variable are what have been obtained in the simple case
of one dimension. The value of r is ∑(*x*′ − *x*_{r})(*y*′ − *y*_{r})/σ_{1}σ_{2}.^{[3]} The
probable error of the determination is assigned on the assumption
that the errors to which it is liable are small.^{[4]} Such coefficients
have already been calculated for a great number of interesting cases.
For instance, the coefficient of correlation between the human
stature and femur is 0.8, between the right and left femur is 0.96,
between the statures of husbands and wives is 0.28.^{[5]}

149. This application of inverse probability to determine correlation-coefficients
and the error to which the determination is liable
has been largely employed by Professor Pearson^{[6]} and other recent
writers. The use of the normal formula to measure the probable—and
improbable—errors incident to such determinations is justified
by reasoning akin to that which has been employed in the general
proof of the law of error.^{[7]} Professor Pearson has pointed out a
circumstance which seems to be of great importance in the theory
of evolution: that the errors incident to the determination of
different frequency-coefficients are apt to be mutually correlated.
Thus if a random selection be made from a certain population, the
correlation-coefficient which fits the organs of that set is apt to differ
from the coefficient proper to the complete group in the same sense
as some other frequency-coefficients.

150. The last remark applies also to the determination of the coefficients, in particular those of correlation, by abridged methods, on principles explained with reference to the simple case; for instance by the formula r = ∑η/∑ξ, where ∑ξ is the sum of (some or all) the positive (or the negative) deviations of the values for one organ or attribute measured by the modulus pertaining to that member, and ∑η is the sum of the values of the other member, which are associated with the constituents of ∑ξ. This variety of this method is certainly much less troublesome, and is perhaps not much less accurate, than the method prescribed by genuine inversion.

151. A method of rejecting data analogous to the use of percentiles
in one dimension is practised when, given the frequency of observations
for each increment of area, *e.g.* each ∆*x* ∆*y*, we utilize only
the frequency for *integral* areas. Mr Sheppard has given an elegant
solution of the problem: to find the correlation between two
attributes, given the medians L, and M, of a normal group for each
attribute and the distribution of the total group, as thus.^{[8]}

Below L, |
Above L, | |

Below M, |
P |
R |

Above M, |
R |
P |

Fig. 12.

If cos D is put for r, the coefficient of correlation, it is found
that D = πR/(P + R). For example, let the group of statistics
relating to dice already ^{[9]}cited from Professor Weldon be arranged
in four quadrants by a horizontal and a vertical line, each of which
separates the total groups into two halves: lines of which equations
prove to be respectively *y* = 6.11 and *x* = 6.156. For R we
have 1360.5, and for P 687.5 roughly. Whence D = π × 0.66;
r = cos 0.66 × π = −½ nearly, as it ought; the negative sign being
required by the circumstance that the lower part of Mr Sheppard's
diagram shown in fig. 12 corresponds to the upper part of Professor
Weldon's diagram shown in par. 115.

152. Necessity rather than convenience is sometimes the motive
for resort to percentiles. Professor Pearson has applied the median
method to determine the correlation between husbands and wives
in respect of the darkness of eye-colour, a character which does not
admit of exact graduation: “our numbers merely refer to certain
groupings, arranged, it is true, in increasing darkness of colour, but
in no way corresponding to equal increases in colour-intensity.”^{[10]}
From data of this sort, having ascertained the number of husbands
with eye-colours above the median tint who marry wives with eye colour
above the median tint, Professor Pearson finds for r the
coefficient of correlation +0.1. A general method for determining
the frequency-constants when the data are, or are taken to be,
of the integral sort has been given by Professor Pearson.^{[11]} Attention
should also be called to Mr Yule's treatment of the problem by a
sort of logical calculus on the lines of Boole and Jevons.^{[12]}

153. In the cases of correlation which have been so far considered,
it has been presupposed that the things correlated range according
Abnormal Correlation.
to the normal law of error. But now, suppose the law
of distribution to be no longer normal: for instance, that
the dots on the plane of *xy*,^{[13]} representing each a pair of
members, are no longer grouped in elliptic (or circular) rings of
equal frequency, that the locus of the maximum *y* deviation,
corresponding to an assigned *x* deviation, is no longer a right
line. How is the interdependence of these deviations to be
formulated? It is submitted that such data may be treated as if
they were normal: by an extension of the *Method of Least Squares*,
in two or more dimensions.^{[14]} Thus when the amount of pauperism
together with the amount of outdoor relief is plotted in several unions
there is obtained a distribution far from normal. Nevertheless if
the average pauperism and average outdoor relief are taken for
*aggregates*—say quintettes or decades—of unions taken at random, it
may be expected that these means will conform to the normal law,
with coefficients obtained from the original data, according to the
rule which is proper to the case of the normal law.^{[15]} By obtaining
averages conforming to the normal law, as by the simple application
of the method of least squares, we should not indeed have utilized
the whole of our data, but we shall put a part of it in a very useful

- ↑ Above, par. 130.
- ↑ See
*Phil. Mag.*(1888), “On a New Method of Reducing Observations”; where a comparison in respect of convenience and accuracy with the received method is attempted. - ↑ Corresponding to the of pars. 14, 127 above.
- ↑ Pearson,
*Trans. Roy. Soc.*, A, 191, p. 234. - ↑ Pearson,
*Grammar of Science*, 2nd ed. p. 402, 431. - ↑
*Trans. Roy. Soc.*(1898), A, vol. 191;*Biometrika*, ii. 273. - ↑ Above, par. 107. Compare the proof of the “Subsidiary Law
of Error,” as the law in this connexion may be called, in the paper
on “Probable Errors,”
*Journ. Stat. Soc.*(June 1908). - ↑
*Trans. Roy. Soc.*(1899), A, 192, p. 141. - ↑ Above, par. 115.
- ↑
*Grammar of Science*, p. 432. - ↑
*Trans. Roy. Soc.*, A, vol. 195. In this connexion reference should also be made to Pearson's theory of “Contingency” in his thirteenth contribution to the “Mathematical Theory of Evolution” (*Drapers' Company Research Memoirs*). - ↑
*Trans. Roy. Soc.*(1900), A, 194, p. 257; (1901), A, 197, p. 91. - ↑ Above, par. 127.
- ↑ Above, par. 116.
- ↑ If from the given set of
*n*observations (each corresponding to a point on the plane*xy*) there is derived a set of*n*/*s*observations each obtained by averaging a batch numbering*s*of the original observation; the coefficient of correlation for the derived system is the same as that which pertains to the original system. As to the standard deviation for the new system see note to par. 135.