Page:Sm all cc.pdf/36

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.
33

Although the entire subject of data rejection is controversial, an objective rejection criterion seems preferable to a subjective decision. One objective rejection criterion is Chauvenet’s criterion: a measurement can be rejected if the probability of obtaining it is less than 1/2N. For example, if N=20 then a measurement must be so distant from the mean that the probability of obtaining such a value is less than 1/40 or 2.5%. Table 4 gives these cutoffs, expressed as the ratio of the observed deviation (di) to the standard deviation, where the deviation from the mean is simply di = |xi-X|.

Table 4. Deviation from the mean required for exclusion of a data point according to Chauvenet’s criterion [Young, 1962].
N: 5 6 7 8 9 10 12 14 16 18 20
di/σ: 1.65 1.73 1.81 1.86 1.91 1.96 2.04 2.1 2.15 2.2 2.24
 
N: 25 30 40 50 60 80 100 150 200 400 1000
di/σ: 2.33 2.39 2.49 2.57 2.64 2.74 2.81 2.93 3.02 3.23 3.48

What mean and standard deviation should one use in applying Chauvenet’s criterion? The calculated mean and especially standard deviation are extremely sensitive to extreme points. Including the suspect point in the calculation of X and σ substantially decreases the size of di/σ and thereby decreases the likelihood of rejecting the point. Excluding the suspect point in calculating the mean and standard deviation, however, is tantamount to assuming a priori what we are setting out to test; such a procedure often would allow us to reject extreme values that are legitimate parts of the sample population. Thus we must take the more conservative approach: the mean and standard deviation used in applying Chauvenet’s criterion should be those calculated including the suspect point.

If Chauvenet’s criterion suggests rejection of the point, then the final mean and standard deviation should be calculated excluding that point. In theory, one then could apply the criterion again, possibly reject another point, recalculate mean and standard deviation again, and continue until no more points can be rejected. In practice, this exclusion technique should be used sparingly, and applying it more than once to a single dataset is not recommended.

Often one waffles about whether or not to reject a data point even if rejection is permitted by Chauvenet’s criterion. Such doubts are warranted, for we shall see in later examples that Chauvenet’s criterion occasionally permits rejection of data that are independently known to be reliable. An alternative to data rejection is to use some of the nonparametric statistics of the next section, for they are much less sensitive than parametric techniques are to extreme values.

Median, Range, and 95% Confidence Limits

Until now, we have used parametric statistics, which assume a normal distribution. Nonparametric statistics, in contrast, make no assumption about the distribution. Most scientific studies employ parametric, not nonparametric, statistics, for one of four reasons:

  • experimenter ignorance that parametric statistics should only be applied to normal distributions;
  • lack of attention to whether or not one’s data are normally distributed;
  • ignorance about nonparametric statistical techniques;
  • greater efficiency of parametric statistics.