An image should appear at this position in the text. To use the entire page scan as a placeholder, edit this page and replace "{{missing image}}" with "{{raw image|Proceedings of the Royal Society of London Vol 60.djvu/513}}". Otherwise, if you are able to provide the image then please do so. For guidance, see Wikisource:Image guidelines and Help:Adding images. |
subjecting the distances of the means from the line to some minimal condition. If the slope of RR is positive we may say that large values of x are on the whole associated with large values of y, if it is negative large values of x are associated with small values of y. Further, if the slope of RR to the vertical be given we shall have a measure of a rough practical kind of tbe shift of the mean of an a1-array when its type y is altered. The equation to RR consequently gives a concise and definite answer to two most important statistical questions. It is also evident that if the means of the arrays actually lie in a straight line (as in normal correlation), the equation to RR must be the equation to the line of regression.
Let n be the number of observations in any a>array, and let cl be the horizontal distance of the mean of this array from the line RR. I propose to subject the line to the condition that the sum of all quantities like nd?shall be a minimum, i.e., I shall use the condition of least squares. I do this solely for convenience of analysis; I do not claim for the method adopted any peculiar advantage as regards the probability of its results. It would, in fact, be absurd to do so, for I am postulating at the very outset that the curve of regression is only exceptionally a straight line; there can consequently be no meaning in seeking for the most probable straight line to represent the regression.
Let x, ybe a pair of associated deviations, let a be the standard deviation of any array about its own mean, and let