Page:Sm all cc.pdf/26

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.
23

When we have 50 or 100 measurements instead of 20, we find that a finer histogram-binning interval is better for visualizing the pattern of the data. Figure 2 shows that an interval of about 0.5 is best for 100 measurements of this data type.

Normal Distribution

The data shown in Figures 1 and 2 have what is called a normal distribution. Such a distribution is formally called a Gaussian distribution or informally called a bell curve. The normal distribution has both a theoretical and an empirical basis. Theoretically, we expect a normal distribution whenever some parameter or variable X has many independent, random causes of variation and several of these so-called ‘sources of variance’ have effects of similar magnitude. Even if an individual type of error is non-normally distributed, groups of such errors are. Empirically, countless types of measurements in all scientific fields exhibit a normal distribution. Yet we must always verify the assumption that our data follow a normal distribution. Failure to test this assumption is scientists’ most frequent statistical pitfall. This mistake is needless, because one can readily examine a dataset to determine whether or not it is normally distributed.

Mean and Standard Deviation

For any dataset that follows a normal distribution, regardless of dataset size, virtually all of the information is captured by only three variables:

N: the number of data points, or measurements;

X: the mean value; and

σ: the standard deviation.

The mean (X), also called the arithmetic mean, is an average appropriate only for normal distributions. The mean is defined as: