Page:Sm all cc.pdf/51

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.
48

trends? I am not enough of a baseball buff to know, but I note that the 1921-1930 peak is dominated by Rogers Hornsby, who had the highest average in 7 of these 10 years. Often in such analyses, identification of a trend’s existence is the first step toward understanding it and, in some cases, toward preventing it.

Of course, substantial ‘noise’, or annual variation, is superimposed on these long-term trends. Later in this section, we will consider removal of such trends, but here we will take a simpler and less satisfactory approach: we will limit our data analysis to the time interval 1931-1990. We thereby omit the time intervals in which secular (temporal) trends were dominant. If this shorter interval still contains a slight long-term trend, that trend is probably too subtle to jeopardize our conclusions.

For 1931-1990 batting averages (Figure 8c), skewness is substantially less than for the larger dataset, and no points are flagged for rejection by Chauvenet’s criterion. The standard deviation is reduced by one third, but the 95% confidence limits are only slightly reduced because the decrease in number of points counteracts the improvement in standard deviation.

Confining one’s analysis to a subsample of the entire dataset is a legitimate procedure, if one has objective grounds for defining the subset and if one does not apply subset-based interpretations to the overall population. Obviously it would be invalid to analyze a ‘subset’ such as batting averages less than 400. Will the 1991 maximum batting average be 347±15 as predicted by the 1931-1990 data, or will there be another Rogers Hornsby?