Page:EB1911 - Volume 22.djvu/406

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.

aggregate is given by a corresponding term in the expansion of (q + p), and by a well-known theorem[1] this term is approximately equal to 1/π2npqeν2/2npq; where ν is the number of integers by which the term is distant from np (or an integer close to np); provided that ν is of (or <) the order √n. Graphically, let the sortition made for each element be represented by the taking or not taking with respective frequency p and q a step of length i. If a body starting from zero takes successively n such steps, the point at which it will most probably come to a stop is at npi (measured from zero); the probability of its stopping at any neighbouring point within a range of ± √ni is given by the above-written law of frequency, νi being the distance of the stopping-point from npi. Put νi = x and 2npqi2 = c2; then the probability may be written .

104. It is a short step, but a difficult one, from this case, in which the element is binomial—heads or tails—to the general case, in which the element has several values, according to the law of frequency—consists, for instance, of the member of points presented by a randomly-thrown die. According to the general theorem, if Q is the sum[2] of numerous elements, each of which assumes different magnitudes according to a law of frequency, z = fr(x), the function f being in general different for different elements, the number of times that Q assumes magnitudes between x and x + ∆x in the course of N trials is Nzx, if ; where a is the sum of the arithmetic means of all the elements, any one of which ar = [∫xfr(x)dx], the square brackets denoting that the integrations extend between the extreme limits of the element's range, if the frequency-locus for each element is continuous, it being understood that [∫fr(x)dx] = 1; and k is the sum of the mean squares of error for each element, = ∑[∫ξ2fr(ar + ξ)dξ], if the frequency-locus for each element is continuous, where ar is the arithmetic mean of one of the elements, and ξ the deviation of any value assumed by that element from ar, ∑ denoting summation over all the elements. When the frequency-locus for the element is not continuous, the integrations which give the arithmetic mean and mean square of error for the element must be replaced by summations. For example, in the case of the dice above instanced, the law of frequency for each element is that it assumes equally often each of the values 1, 2, 3, 4, 5, 6. Thus the arithmetic mean for each element is 3.5, and the mean square of error {(3.5 − 1)2 + (3.5 − 2)2 + &c.}/6 = 2.916. Accordingly, the sum of the points obtained by tossing a large number, n, of dice at random will assume a particular value x with a frequency which is approximately assigned by the equation


The rule equally applies to the case in which the elements are not similar; one might be the number of points on a die, another the number of points on a domino, and so on. Graphically, each element is no longer represented by a step which is either null or i, but by a step which may be, with an assigned probability, one or other of several degrees between those limits, the law of frequency and the range of i being different for the different elements.

105. Variant Proofs.—The evidence of these statements can only be indicated here. All the proofs which have been offered involve some postulate as to the deviation of the elements from their respective centres of gravity, their “errors.” If these errors extended to infinity, it might well happen that the law of error would not be fulfilled by a sum of such elements.[3] The necessary and sufficient postulate appears to be that the mean powers of deviation for the elements, the second (above written) and the similarly formed third, fourth, &c., powers (up to some assigned power), should be finite.[4]

106. (1) The proof which seems to flow most directly from this postulate proceeds thus. It is deduced that the mean powers of deviation for the proposed representative curve, the law of error (up to a certain power), differ from the corresponding powers of the actual locus by quantities which are negligible when the number of the elements is large.[5] But loci which have their mean powers of deviation (up to some certain power) approximately equal may be considered as approximately coincident.[6]

107. (2) The earliest and best-known proof is that which was originated by Laplace and generalized by Poisson.[7] Some idea of this celebrated theory may be obtained from the following free version, applied to a simple case. The case is that in which all the elements have one and the same locus of frequency, and that locus is symmetrical about the centre of gravity. Let the locus be represented by the equation η = φ(ξ), where the centre of gravity is the origin, and φ() = φ(−ξ); the construction signifying that the probability of the element having a value ξ (between say ξ − ½∆ξ and ξ + ½∆ξ is φ(ξ)∆ξ. Square brackets denoting summation between extreme limits, put χ(a) for [Sφ(ξ)e√−1aξξ] where ξ is an integer multiple of ∆ξ (or ∆x) = ρx, say. Form the mth power of χ(a). The coefficient of e√−1arx in (χ(a))m is the probability that the sum of the values of the m elements should be equal to rx; a probability which is equal to ∆xyr, where y is the ordinate of the locus representing the frequency of the compound quantity (formed by the sum of the elements). Owing to the symmetry of the function φ the value of yr, will not be altered if we substitute for e√−1arx, e−√−1arx, nor if we substitute ½(e+√−1arx + e−√−1arx), that is cos arx. Thus (χ(a))m becomes a sum of terms of the form ∆xyr cos arx, where yr = y+r. Now multiply (χ(a))m thus expressed by cos txa, where, t being an integer, tx =x, the abscissa of the “error” the probability of whose occurrence is to be determined. The product will consist of a sum of terms of the form xyr ½(cos a(r + t)∆x + cos a(rt)∆x). As every value of rt (except zero) is matched by a value equal in absolute magnitude, r + t, and likewise every value of r + t is matched by value rt, the series takes the form xyr∑ cos qax + ∆xyt, where q has all possible integer values from 1 to the largest value of |r|[8] increased by |t|; and the term free from circular functions is the equivalent of xyr cos a(r + t)∆x, when r = −t, together with xyr cos a(rt)∆x, when r = +t. Now substitute for ax a new symbol β; and integrate with respect to β, the thus transformed (χ(a))m cos txa between the limits β = 0 and β = π. The integrals of all the terms which are of the form xyr cos qβ will vanish, and there will be left surviving only πxyt. We thus obtain, as equal to πxyt, . Now change the independent variable to a; then as dβ = dax,


Replacing tx by x, and dividing both sides by ∆x, we have


Now expanding the cos ax which enters into the expression for χ(a), we obtain

χ(a) = [Sφ(a)] − 1/2![Sφ(a)a2]x2 + 1/4![Sφ(a)a4]x4 . . ⋅

Performing the summations indicated, we express χ(a) in terms of the mean powers of deviation for an element. Whence χ(a)m is expressible in terms of the mean powers of the compound locus. First and chief is the mean second power of deviation for the compound, which is the sum of the mean second powers of deviation for the elements, say k. It is found that the sought probability may be equated to - . . ., where k2 is the coefficient defined below.[9] Here π/∆x may be replaced by ∞, since the finite difference ∆x is small with respect to unity when the number of the elements is large;[10] and thus the integrals involved become equateable to known definite integrals. If it were allowable to neglect all the terms of the series but the first the expression would reduce to 1/√(2πk)eu2/k, the normal law of error. But it is allowable to neglect the terms after the first, in a first approximation, for values of x not exceeding a certain range, the number of the elements being large, and if the postulate above enunciated is satisfied.[11] With these reservations it is proved that the sum of a number of similar and symmetrical elements conforms to the normal law of error. The proof is by parity extended to the case in which the elements have different but still symmetrical frequency functions; and, by a bolder use of imaginary quantities, to the case of unsymmetrical functions.

  1. By the use of Stirling's and Bernoulli's theorems, Todhunter, History. . . of Probability.
  2. The statement includes the case of a linear function, since an element multiplied by a constant is still an element.
  3. E.g. if the frequency-locus of each element were 1/π(1 + x2), extending to infinity in both directions. But extension to infinity would not be fatal, if the form of the element's locus were normal.
  4. For a fuller exposition and a justification of many of the statements which follow, see the writer's paper on “The Law of Error” in the Camb. Phil. Trans. (1905).
  5. Loc. cit. pt. i. § 1.
  6. On this criterion of coincidence see Karl Pearson's paper “On the Systematic Fitting of Curves,” Biometrika, vols. i. and ii.
  7. Laplace, Théorie analytique des probabilités, bk. ii. ch. iv.; Poisson, Recherches sur la probabilité des judgements. Good restatements of this proof are given by Todhunter, History . . . of Probability art. 1004, and by Czuber, Theorie der Beobachtungsfehler, art. 38 and Th. 2, §4.
  8. The symbol || is used to denote absolute magnitude, abstraction being made of sign.
  9. Below, pars. 159, 160.
  10. Loc. cit. app. I.
  11. Loc. cit. p. 53 and context.