numerous set of observations, say *x*_{1}, *x*_{2}, . . . *x*_{n} (taken as a sample
from an indefinitely large group obeying any the same law of
frequency) varies from set to set approximately according to the
following law (to be established later)

, say;

where *c*^{2}/2 the mean square of deviation, and *j* = the mean
cube of deviation, and *j*/*c*_{3}, say *j*, is small. Then, by abstraction
analogous to that which has just been attributed to the
method of least squares, we may regard the datum as a single
observation, the arithmetic mean (of a sample batch of observations)
subject to the law of error *z* = *f*(*x*). The most probable
value of the *quaesitum* is therefore given by the equation *f*′(*x* − *x*′) = 0,
where *x*′ is the arithmetic mean of the given observations.
From the resulting quadratic equation, putting *x* = *x*′ + ε, and
recollecting that ε is small we have ε = j*c*. That is the correction
due to the utilization of the mean cube of error. The *most advantageous*
solution cannot now be determined,^{[1]} *f*(*x*) being unsymmetrical,
without assuming a particular form for the function of detriment.
This method of least squares *plus* cubes may easily be extended to
the case of several batches.

137. This application of probabilities not to the actual data but
to a selected part thereof, this economy of the inverse method, is
widely practised in miscellaneous statistics, where the object is to
determine whether the discrepancy between two sets of observation
is accidental or significant of a real difference.^{[2]} For instance, let
the data be ages at death of individuals of two classes (*e.g.* temperate
or not so, urban or rural, &c.) who have been under observation,
since the age of, say, 20. Granted that the ages at death conform
to Gompertz's law; the determination of the *modal* age at death,
that age at which the proportion of the total observed dying (per
unit of time) is a maximum for each class, would most perfectly
be effected by the genuine inverse method. That method will also
enable us to determine the probability that the two modes should
have differed to the observed extent by mere accident.^{[3]} According
to the abridged method it suffices to proceed as if our data consisted
of two observations *x*′ and *y*′, the average ages at death
of the two classes, each average obeying the normal law of error,
with respective moduli ,
where *x*_{1}, *x*_{2}, &c., *y*_{1}, *y*_{2}, &c.,
are the respective sets of observed ages at death; as follows from
the law of error, whatever the law of distribution of the given
observations. According to a well-known property of the normal
law, the difference between the averages of *n* and *n*′ observations
respectively will range under a probability-curve with modulus
, say *c*. Whence for the probability that a difference as
great as the observed one, say *e*, should have occurred by
chance we have ½[1 − θ(τ)], where τ = *e*/*c*, and θ(*x*) is the integral
, given in many treatises.

138. This sort of abridgment may be extended to other kinds of
average besides the arithmetic, in particular the *median* (that point
Abridged Methods.
which has as many of the given observations above as
below it). By simple induction we know that the
median of a large sample of observations is a probable
value for the *true* median; how probable is determined as follows
from a selection of our data. First suppose that all the observations
are of the same weight. If *x*′ were the true median,
the probability that as many as ½*n* + r of the observations should
fall on either side of that point is given by the normal law for which
the *exponent* is -2r^{2}/n.^{[4]} This probability that the observed median
will differ from the true one by a certain number of observations is
connected with the probability that they will differ by a certain
extent of the abscissa, by the proposition that the number of observations
contained between the true and apparent median is equal
to the small difference between them multiplied by the density of
observations at the median—in the case of normal and generally
symmetrical curves the greatest ordinate. This is the second datum
we require to select. In the case of the normal curve it may be
calculated from the modulus itself, determined by induction from a
selection of data. If the observations are not all of the same worth,
weight may be assigned by counting one observation as if it occurred
oftener than another. This is the essence of Laplace's Method
of Situation.^{[5]}

139. In its simplest form, where all the given observations are of
equal weight, this method is of wide applicability. Compared
with the genuine inverse method, it is always more convenient,
seldom much less accurate, sometimes even more accurate. If the
given observations obey the normal law, the precision of the median
is less than the precision of the arithmetic mean by only some
25%—a discrepancy not very serious where only a rough estimate of the
worth of an average is required. If the observations do not obey
the normal law—especially if the extremities are abnormally
divergent—the precision of the median may be greater than that of the
arithmetic mean.^{[6]}

140. Yet another instance of the contrast between genuine and
abridged inversion is afforded by the problem to determine the
Determination of Frequency-Constants.
modulus as well as the mean for a set of observations
known to obey the normal law; what the first problem^{[7]}
becomes when the coefficient of dispersion is not given.
By inverse probability we ought in that case, in addition
to the equation *d*P/*dx* = 0, to put *d*P/*dc* = 0. Whence
*c*^{2} = 2[(*x*′ − *x*_{1})^{2} + (*x*′ − *x*_{2})^{2} + &c. + (*x*′ − *x*_{n})^{2}]/*n*, and
*x*′ = (*x*_{1} + *x*_{2} + &c. + *x*_{n})/*n*. This solution differs from that which is
often given in the textbooks^{[8]} in that there, in the expression for
*c*^{2}, (*n* − 1) occurs in the denominator instead of *n*. The difference
is explained by the fact that the authorities referred to determine c,
not by genuine inversion, but by ordinary induction, by a condition
which certainly would be fulfilled in the long run, but does not
express the whole of our data; a condition in this respect like the
equation of c to , where e is the difference (taken positively,
without regard to its sign) between any observation and the arithmetic
mean of all the observations.^{[9]}

141. Of course the determination of the most probable value is
subject to the speculative difficulties proper to a priori probability:
which are particularly striking in this case, as it appears equally
natural to take as that constant, of which the values are a priori
equally probable, *k*( = *c*^{2}/2), or even^{[10]} *h*( = 1/*c*^{2}), the measure of
weight, as in fact Laplace has done;^{[11]} yet no two of these assumptions
can be exactly true.^{[12]}

142. A more convenient determination is obtained from simple
induction by equating the modulus to some datum of the observed
group to which it would be equal if the group were complete—in
particular to the distance from the median of some percentile
(or point which marks off a certain percentage, *e.g.* 25 of the given
observations) multiplied by a factor corresponding to the percentile
obtainable from a familiar table. Mr Sheppard has given an interesting
proof^{[13]} that we cannot by way of percentiles obtain such good^{[14]}
results for the frequency-constants as by the use of “the average
and average square” [the method prescribed by inverse probability].

143. The same philosophical subtleties, with greater mathematical
complications, meet us when we pass on to the case of two or more
Entangled Measurements.
*quaesita*. The problem under this head which mainly
exercised the older writers was to determine a number of
unknown quantities, given a larger number, *n*, of
equations involving them.

144. Supposing the true values approximately known, by substituting
the approximate values in the given equations and expanding
according to Taylor's theorem, there will be obtained for the *corrections*,
say *x*, *y* . . ., *n* linear equations of the form

*a*_{1}*x* + *b*_{1}*y* · · = *f*_{1}

*a*_{2}*x* + *b*_{2}*y* · · = *f*_{2},

where each *a* and *b* is a known coefficient, and each *f* is a
fallible observation. Suppose that the error to which each is
liable obeys the normal law, and that the modulus pertaining to each
observation is the same—which latter condition can be secured by
multiplying each equation by a proper factor—then if *x*′ and *y*′
are the true values of the *quaesita*, the frequency with which
(*a*_{1}*x*′ + *b*_{1}*y*′ − *f*_{1}) assumes different values is given by the equation
, where *c*_{1} is constant which,

- ↑ The use of the cubes is also contrasted with that of the squares
(only) in this respect: that it is no longer a matter of indifference
*how many*of the original observations we assign to the batch of which the mean constitutes the single (compound) observation. - ↑ The object of the writer's paper on “Methods of Statistics”
in the Jubilee number of the
*Journ. Stat. Soc.*(1885). - ↑ See on the use of the inverse method to determine the mode of
a group, the present writer's paper on “Probable Errors” in the
*Journ. Stat. Soc.*(Sept. 1908). - ↑ Above, par. 103.
- ↑
*Théorie analytique*, 2nd supp. p. 164.*Mécanique céleste*, bk. iii. art. 40; on which see the note in Bowdich's translation. The method may be extended to other percentiles. See Czuber,*Beobachtungsfehler*, § 58. Cf.*Phil. Mag.*(1886), p. 375; and Sheppard,*Trans. Roy. Soc.*(1889), 192, p. 135,*ante*, where the error incident to this kind of determination is ascertained with much precision. - ↑ Cf.
*Phil. Mag.*(1887), xxiv. 269 seq., where the median is prescribed in case of “discordant” (heterogeneous) observations. If the more drastic remedy of rejecting part of the data is resorted to Sheppard's method of performing that operation may be recommended (*Proc. Lond. Math. Soc.*vol. 31). He prescribes for cases to which the median may not be appropriate, namely, the determination of other frequency-constants besides the mean of the observations. - ↑ Above, par. 134.
- ↑
*E.g.*Airy,*Theory of Errors*, art. 60. - ↑ It is a nice point that the expression for
*c*^{2}, which has (*n*− 1) instead of*n*for denominator, though not the more*probable*, may yet be the more*advantageous*(supposing that there were any sensible difference between the two). Cf.*Camb. Phil. Trans.*(1885), vol. xiv. pt. ii. p. 165; and “Probable Errors,”*Journ. Stat. Soc.*(June 1908). - ↑ Above, par. 96, note.
- ↑
*Théorie analytique*, 2nd supp. ed. 1847, p. 578. - ↑ See the matter discussed in
*Camb. Phil. Trans.*,*loc. cit.* - ↑
*Trans. Roy. Soc.*(1899), A, cxcii. 135. - ↑ Good as tested by a comparison of the mean squares of errors in the frequency-constant determined by the compared methods.