putting *r* = OH, *r*′ = OK; as *r* = 2α sin θ, *r*′ = 2α sin θ′,

M_{1} = .

Professor Sylvester has remarked that this double integral, by means of the theorem

,

is easily shown to be identical with

.

∵ M_{1} = 35α^{2}36π; ∵ M = 3548π^{2}πα^{2}.

From this mean value we pass to the probability that four points within a circle shall form a re-entrant figure, viz.

*p* = 3512π^{2}.

94. The function of expectation in this class of problem appears
to afford an additional justification of the position here assigned
to this conception^{[1]} as distinguished from an average in the more
general sense which is proper to the following Part.

Part II.—Averages and Laws of Error

95. *Averages*.—An average may be defined as a quantity
derived from a given set of quantities by a process such that,
if the constituents become all equal, the average will coincide
with the constituents, and the constituents not being equal,
the average is greater than the least and less than the greatest
of the constituents. For example, if *x*_{1}, *x*_{2}, . . . *x*_{n}, are the constituents,
the following expressions form averages (called respectively
the arithmetic, geometric and harmonic means):—

*x*_{1} + *x*_{2} + . . . + *x*_{n}*n*.

(*x*_{1} × *x*_{2} × . . . × *x*_{n})^{1/n}.

1/1*n*(1*x*_{1} + 1*x*_{2} + . . . + 1*x*_{n}).

The conditions of an average are likewise satisfied by innumerable other symmetrical functions, for example:—

(*x*_{1}^{2} + *x*_{2}^{2} + . . . + *x*_{n}^{2}*n*)^{½}

The conception may be extended from symmetrical to unsymmetrical
functions by supposing any one or more of the constituents
in the former to be repeated several times. Thus if in the
first of the averages above instanced (the arithmetic mean) the
constituent *x*_{r}, occurs *l* times, the expression is to be modified
by putting *lx*_{r} for *x*_{r} in the numerator, and in the denominator,
for *n*, *n*+*r*−1. The definition of an average covers a still wider
held. The process employed need not be a *function*.^{[2]} One of
the most important averages is formed by arranging the constituents
in the order of magnitude and taking for the average
a value which has as many constituents above it as below it,
the median. The designation is also extended to that value
about which the greatest number of the constituents cluster most
closely, the “centre of greatest density,” or (with reference to
the geometrical representation of the grouping of the constituents)
the greatest ordinate, or, as recurring most frequently,
the mode.^{[3]} But to comply with the definition there must be
added the condition that the mode does not occur at either
extremity of the range between the greatest and the least of the
constituents. There should be also in general added a definition
of the process by which the mode is derived from the given
constituents.^{[4]} Perhaps this specification may be dispensed
with when the number of the constituents is indefinitely large.
For then it may be presumed that *any* method of determining
the mode will lead to the same result. This presumption presupposes
that the constituents are quantities of the kind which
form the sort of “series” which is proper to Probabilities.^{[5]} A
similar presupposition is to be made with respect to the constituents
of the other averages, so far as they are objects of
probabilities.

96. *The Law of Error*.—Of the propositions respecting
average with which Probabilities is concerned the most important
are those which deal with the relation of the average to its constituents,
and are commonly called “laws of error.” Error is
defined in popular dictionaries as “deviation from truth”;
and since truth commonly lies in a mean, while measurements
are some too large and some too small, the term in scientific
diction is extended to deviations of statistics from their average,
even when that average—like the mean of human or barometric
heights—does not stand for any real objective thing. A “law
of error” is a relation between the extent of a deviation and the
frequency with which it occurs: for instance, the proposition
that if a digit is taken at random from mathematical tables, the
difference between that figure and the mean of the whole series
(indefinitely prolonged) of figures so obtained, namely, 4.5, will
in the long run prove to be equally often ±0.5, ±1.5, ±2.5, ±3.5,
±4.5.^{[6]} The assignment of frequency to *discrete* values—as 0,
1, 2, &c., in the preceding example—is often replaced by a
continuous curve with a corresponding equation. The distinction
of being *the* law of error is bestowed on a function which is
applicable not merely to one sort of statistics—such as the digits
above instanced—but to the great variety of miscellaneous
groups, generally at least, if not universally. What form is
most deserving of this distinction is not decided by uniform
usage; different authorities do not attach the same weight to
the different grounds on which the claim is based, namely the
extent of cases to which the law may be applicable, the closeness
of the application, and the presumption prior to specific experience
in favour of the law. The term “the law of error” is here
employed to denote (1) a species to which the title belongs by
universal usage, (2) a wider class in favour of which there is the
same sort of a priori presumption as that which is held to justify
the more familiar species. The law of error thus understood
forms the subject of the first section below.

97. *Laws of Frequency*—What other laws of error may
require notice are included in the wider genus “laws of frequency,”
which forms the subject of the second section. Laws
of frequency, so far as they belong to the domain of Probabilities,
relate much to the same sort of grouped statistics as laws of
error, but do not, like them, connote an explicit reference to an
average. Thus the sequence of random digits above instanced
as affording a law of error, considered without reference to the
mean value, presents the law of frequency that one digit occurs as
often as another (in the long run). Every law of error is a law
of frequency; but the converse is not true. For example, it is a
law of frequency—discovered by Professor Pareto^{[7]}—that the
number of incomes of different size (above a certain size) is
approximately represented by the equation *y* = A/*x*^{a}, where *x*
denotes the size of an income, *y* the number of incomes of that
size. But whether this generalization can be construed as a law
of error (in the sense here defined) depends on the nice inquiry
whether the point from which the frequency diminishes as the
income *x* increases can be regarded as a “mode,” *y* diminishing
as *x* decreases from that point.

- ↑ See introductory remarks and note to par. 95.
- ↑ A great variety of (functional) averages, including those which
are best known, are comprehended in the following general form
φ
^{−1}{M[φ(*x*_{1}), φ(*x*_{2}), . . . φ(*x*_{n})]}; where φ is an arbitrary function, φ^{−1}is inverse (such that φ^{−1}(φ(*x*)) ≡*x*), M is any (functional) mean. When M denotes the arithmetic mean; if φ(*x*) ≡ log*x*(φ^{−1}(*x*) ≡*e*^{x}) we have the geometric mean; if φ(*x*) ≡ 1/*x*, we have the harmonic mean. Of this whole class of averages it is true that the average of several averages is equal to the average of all their constituents. - ↑ This convenient term was introduced by Karl Pearson.
- ↑
*E.g.*some specified method of*smoothing*the given statistics. - ↑ See above, pt. i., pars. 3 and 4. Accordingly the expected
value of the sum of
*n*(similar) constituents (*x*_{1}+*x*_{2}+ . . . +*x*_{n}) may be regarded as an average, the average value of*nx*_{r}where*x*_{r}is any one of the constituents. - ↑ See as to the fact and the evidence for it, Venn,
*Logic of Chance*, 3rd ed., pp. 111, 114. Cf.*Ency. Brit.*, 8th ed., art. “Probability,” p. 592; Bertrand,*op. cit.*, preface § ii.; above, par. 59. - ↑ See his
*Cours d'économie politique*, ii. 306. Cf. Bowley, Evidence before the Select Committee on Income Tax (1906, No. 365, Question 1163 seq.); Benini,*Metodologica statistica*, p. 324, referred to in the*Journ. Stat. Soc.*(March, 1909).