99% confidence that the true mean is within the interval -0.23 to 0.27. Actually the true mean for this dataset is zero.
N: | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
t95: | 12.71 | 4.303 | 3.182 | 2.776 | 2.571 | 2.447 | 2.365 | 2.306 | 2.262 | 2.228 |
t99: | 63.66 | 9.925 | 5.841 | 4.604 | 4.032 | 3.707 | 3.499 | 3.355 | 3.250 | 3.169 |
N: | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 |
t95: | 2.201 | 2.179 | 2.160 | 2.145 | 2.131 | 2.120 | 2.110 | 2.101 | 2.093 | 2.086 |
t99: | 3.106 | 3.055 | 3.012 | 2.977 | 2.947 | 2.921 | 2.898 | 2.878 | 2.861 | 2.845 |
N: | 22 | 23 | 24 | 25 | 30 | 40 | 60 | 80 | 100 | ∞ |
t95: | 2.080 | 2.074 | 2.069 | 2.064 | 2.045 | 2.023 | 2.001 | 1.990 | 1.984 | 1.960 |
t99: | 2.831 | 2.819 | 2.807 | 2.797 | 2.756 | 2.713 | 2.662 | 2.640 | 2.627 | 2.576 |
Selection of a confidence level (α95, α99, etc.) usually depends on one’s evaluation of which risk is worse: the risk of incorrectly identifying a variable or effect as significant, or the risk of missing a real effect. Is the penalty for error as minor as having a subsequent researcher correct the error, or could it cause disaster such as an airplane crash? If prior knowledge suggests one outcome for an experiment, then rejection of that outcome needs a higher than ordinary confidence level. For example, no one would take seriously a claim that an experiment demonstrates test-tube cold fusion at the 95% confidence level; a much higher confidence level plus replication was demanded. Most experimenters use either a 95% or 99% confidence level. Tables for calculation of confidence limits other than 95% or 99%, called tables of the t distribution, can be found in any statistics book.
The standard error of the mean σX is also the key to estimating how many measurements to make. The definition σX=σN-0.5 can be recast as N=σ2/σ2X. Suppose we want to make enough measurements to obtain a final mean that is within 2 units of the true mean (i.e., σX≤2), and a small pilot study permits us to calculate that our measurement scatter σ≈10. Then our experimental series will need N≥102/22, or N≥25, measurements to obtain the desired accuracy at the 68% confidence level (or 1σX). For about 95% confidence, we recall that about 95% of points are within 2σ of the mean and conclude that we would need 2σX≤2, so N≥102/12, or N≥100 measurements. Alternatively and more accurately, we can use the t table above to determine how many measurements will be needed to assure that our mean is within 2 units of the true mean at the 95% confidence level (α95>≤2): we need for t95=α95/σX=α95N0.5/σ=2N0.5/10=0.2N0.5 to be greater than the t95 in the table above for that N. By trying a few values of N, we see that N≥100 is needed.
As a rule of thumb, one must quadruple the number of measurements in order to double the precision of the result. This generalization is based on the N0.5 relationship of standard deviation to standard error and is strictly true only if our measure of precision is the standard error. If, as is of-