aggregate is given by a corresponding term in the expansion of
(*q* + *p*), and by a well-known theorem^{[1]} this term is approximately
equal to 1√π2*npq**e*^{−ν2/2npq}; where ν is the number of integers
by which the term is distant from *np* (or an integer close to *np*);
provided that ν is of (or <) the order √*n*. Graphically, let the
sortition made for each element be represented by the taking or
not taking with respective frequency *p* and *q* a step of length *i*.
If a body starting from zero takes successively *n* such steps, the
point at which it will most probably come to a stop is at *npi*
(measured from zero); the probability of its stopping at any neighbouring
point within a range of ± √*ni* is given by the
above-written law of frequency, ν*i* being the distance of the stopping-point
from *npi*. Put ν*i* = *x* and 2*npqi*^{2} = *c*^{2}; then the probability
may be written .

104. It is a short step, but a difficult one, from this case, in
which the element is *binomial*—heads or tails—to the general case,
in which the element has several values, according to the law
of frequency—consists, for instance, of the member of points presented
by a randomly-thrown die. According to the general
theorem, if Q is the sum^{[2]} of numerous elements, each of which
assumes different magnitudes according to a law of frequency,
*z* = *f*_{r}(*x*), the function *f* being in general different for different
elements, the number of times that Q assumes magnitudes
between *x* and *x* + ∆*x* in the course of N trials is N*z*∆*x*, if
; where *a* is the sum of the arithmetic
means of all the elements, any one of which *a*_{r} = [∫*xf*_{r}(*x*)*dx*], the
square brackets denoting that the integrations extend between the
extreme limits of the element's range, if the frequency-locus for each
element is continuous, it being understood that [∫*f*_{r}(*x*)*dx*] = 1;
and *k* is the sum of the mean squares of error for each element,
= ∑[∫ξ^{2}*f*_{r}(*a*_{r} + ξ)dξ], if the frequency-locus for each element is continuous,
where *a*_{r} is the arithmetic mean of one of the elements,
and ξ the deviation of any value assumed by that element from *a*_{r},
∑ denoting summation over all the elements. When the frequency-locus
for the element is not continuous, the *integrations* which give
the arithmetic mean and mean square of error for the element
must be replaced by *summations*. For example, in the case of the
dice above instanced, the law of frequency for each element is that
it assumes equally often each of the values 1, 2, 3, 4, 5, 6. Thus the
arithmetic mean for each element is 3.5, and the mean square of
error {(3.5 − 1)^{2} + (3.5 − 2)^{2} + &c.}/6 = 2.916. Accordingly, the
sum of the points obtained by tossing a large number, *n*, of dice
at random will assume a particular value *x* with a frequency which
is approximately assigned by the equation

.

The rule equally applies to the case in which the elements are not
similar; one might be the number of points on a die, another the
number of points on a domino, and so on. Graphically, each
element is no longer represented by a step which is either null or *i*,
but by a step which may be, with an assigned probability, one or
other of several degrees between those limits, the law of frequency
and the range of *i* being different for the different elements.

105. *Variant Proofs*.—The evidence of these statements can only
be indicated here. All the proofs which have been offered involve
some postulate as to the deviation of the elements from their respective
centres of gravity, their “errors.” If these errors extended to
infinity, it might well happen that the law of error would not be
fulfilled by a sum of such elements.^{[3]} The necessary and sufficient
postulate appears to be that the mean powers of deviation for the
elements, the second (above written) and the similarly formed third,
fourth, &c., powers (up to some assigned power), should be finite.^{[4]}

106. (1) The proof which seems to flow most directly from this
postulate proceeds thus. It is deduced that the mean powers of
deviation for the proposed representative curve, the law of error
(up to a certain power), differ from the corresponding powers of the
actual locus by quantities which are negligible when the number of
the elements is large.^{[5]} But loci which have their mean powers of
deviation (up to some certain power) approximately equal may be
considered as approximately coincident.^{[6]}

107. (2) The earliest and best-known proof is that which was
originated by Laplace and generalized by Poisson.^{[7]} Some idea of
this celebrated theory may be obtained from the following free
version, applied to a simple case. The case is that in which all the
elements have one and the same locus of frequency, and that locus
is symmetrical about the centre of gravity. Let the locus be represented
by the equation η = φ(ξ), where the centre of gravity is the
origin, and φ(+ξ) = φ(−ξ); the construction signifying that the
probability of the element having a value ξ (between say ξ − ½∆ξ and
ξ + ½∆ξ is φ(ξ)∆ξ. Square brackets denoting summation between
extreme limits, put χ(*a*) for [Sφ(ξ)*e*^{√−1aξ}∆ξ] where ξ is an integer
multiple of ∆ξ (or ∆*x*) = ρ∆*x*, say. Form the *m*th power of χ(*a*).
The coefficient of *e*^{√−1ar∆x} in (χ(*a*))^{m} is the probability that the
sum of the values of the *m* elements should be equal to *r*∆*x*; a
probability which is equal to ∆*xy*_{r}, where *y* is the ordinate of the
locus representing the frequency of the compound quantity (formed
by the sum of the elements). Owing to the symmetry of the
function φ the value of *y*_{r}, will not be altered if we substitute
for *e*^{√−1ar∆x}, *e*^{−√−1ar∆x}, nor if we substitute ½(*e*^{+√−1ar∆x} + *e*^{−√−1ar∆x}),
that is cos *ar*∆*x*. Thus (χ(*a*))^{m} becomes a sum of
terms of the form ∆*xy*_{r} cos *ar*∆*x*, where *y*_{−r} = *y*_{+r}. Now multiply
(χ(*a*))^{m} thus expressed by cos *t*∆*xa*, where, *t* being an integer,
*t*∆*x* =*x*, the abscissa of the “error” the probability of whose
occurrence is to be determined. The product will consist of a sum
of terms of the form ∆*xy*_{r} ½(cos *a*(*r* + *t*)∆*x* + cos *a*(*r* − *t*)∆*x*). As
every value of *r* − *t* (except zero) is matched by a value equal
in absolute magnitude, −*r* + *t*, and likewise every value of
*r* + *t* is matched by value −*r* − *t*, the series takes the form
∆*xy*_{r}∑ cos *qa*∆*x* + ∆*xy*_{t}, where *q* has all possible integer values from 1
to the largest value of |*r*|^{[8]} increased by |*t*|; and the term free from
circular functions is the equivalent of ∆*xy*_{r} cos *a*(*r* + *t*)∆*x*, when
*r* = −*t*, together with ∆*xy*_{r} cos *a*(*r* − *t*)∆*x*, when *r* = +*t*. Now substitute
for *a*∆*x* a new symbol β; and integrate with respect to β,
the thus transformed (χ(*a*))^{m} cos *t*∆*xa* between the limits β = 0 and
β = π. The integrals of all the terms which are of the form
∆*xy*_{r} cos *q*β will vanish, and there will be left surviving only π∆*xy*_{t}.
We thus obtain, as equal to π∆*xy*_{t}, . Now
change the independent variable to *a*; then as *d*β = *da*∆*x*,

.

Replacing *t*∆*x* by *x*, and dividing both sides by ∆*x*, we have

.

Now expanding the cos *ax* which enters into the expression for χ(*a*),
we obtain

χ(*a*) = [Sφ(*a*)] − 12![Sφ(*a*)*a*^{2}]*x*^{2} + 14![Sφ(*a*)*a*^{4}]*x*^{4} . . ⋅

Performing the summations indicated, we express χ(*a*) in terms of
the mean powers of deviation for an element. Whence χ(*a*)^{m} is
expressible in terms of the mean powers of the compound locus.
First and chief is the mean second power of deviation for the compound,
which is the sum of the mean second powers of deviation for
the elements, say *k*. It is found that the sought probability may be
equated to
- . . .,
where *k*_{2} is the coefficient defined below.^{[9]} Here π/∆x may be replaced
by ∞, since the finite difference ∆x is small with respect to unity
when the number of the elements is large;^{[10]} and thus the integrals
involved become equateable to known definite integrals. If it
were allowable to neglect all the terms of the series but the first the
expression would reduce to 1√(2π*k*)*e*^{−u2/k}, the normal law of error.
But it is allowable to neglect the terms after the first, in a first
approximation, for values of *x* not exceeding a certain range, the
number of the elements being large, and if the postulate above
enunciated is satisfied.^{[11]} With these reservations it is proved that
the sum of a number of similar and symmetrical elements conforms
to the normal law of error. The proof is by parity extended to the
case in which the elements have different but still symmetrical
frequency functions; and, by a bolder use of imaginary quantities,
to the case of unsymmetrical functions.

- ↑ By the use of Stirling's and Bernoulli's theorems, Todhunter,
*History. . . of Probability*. - ↑ The statement includes the case of a linear function, since an element multiplied by a constant is still an element.
- ↑
*E.g.*if the frequency-locus of each element were 1/π(1 +*x*^{2}), extending to infinity in both directions. But extension to infinity would not be fatal, if the form of the element's locus were*normal*. - ↑ For a fuller exposition and a justification of many of the statements
which follow, see the writer's paper on “The Law of Error”
in the
*Camb. Phil. Trans.*(1905). - ↑
*Loc. cit.*pt. i. § 1. - ↑ On this criterion of coincidence see Karl Pearson's paper “On
the Systematic Fitting of Curves,”
*Biometrika*, vols. i. and ii. - ↑ Laplace,
*Théorie analytique des probabilités*, bk. ii. ch. iv.; Poisson,*Recherches sur la probabilité des judgements*. Good restatements of this proof are given by Todhunter,*History . . . of Probability*art. 1004, and by Czuber,*Theorie der Beobachtungsfehler*, art. 38 and Th. 2, §4. - ↑ The symbol || is used to denote absolute magnitude, abstraction
being made of
*sign*. - ↑ Below, pars. 159, 160.
- ↑
*Loc. cit.*app. I. - ↑
*Loc. cit.*p. 53 and context.