Page:Popular Science Monthly Volume 60.djvu/106

From Wikisource
Jump to: navigation, search
This page has been validated.
98
THE POPULAR SCIENCE MONTHLY.

consideration the application of the method has been restricted to a study of the relative frequencies of the use of words of different lengths.

The method of procedure is simple and will be best explained by an example. One thousand words in 'Vanity Fair,' taken in consecutive order of course, were counted and classified as to the number of letters in each with the following result:

Letters — 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Words— 25 169 232 187 109 78 79 48 28 20 10 10 2 3

The graphic exhibition of this result is made by the well-known method of rectangular coordinates, using the number of letters in a word as the abscissa and the corresponding number of words in a thousand as the ordinate. On a sheet of 'squared' paper the numbers showing letters in each word, 1, 2, 3, 4, etc., are placed along the horizontal line and on the vertical above each of these is put a point whose distance from the base shows the number of corresponding words in every thousand, according to the scale shown at the left. These points are then joined by straight lines and the whole broken line may be called the 'word spectrum' or 'characteristic curve' of the author as derived from the group of words considered. The group of 1,000 words from 'Vanity Fair' enumerated above is thus graphically represented by the continuous line in Fig. 1, and the method of constructing the characteristic curve will be readily understood by comparing this with the numbers given. As a thousand is a very small number in a problem of this kind, the curve representing any single group of that number of words is practically certain to differ more or less from that of any other such group. In Fig. 1 the dotted line represents a group of 1,000

PSM V60 D106 Linguistics statistics studies.png

Fig. 1. Two Groups—1000 each—Vanity Fair.

words, immediately following that already referred to. Perhaps the most astonishing thing about these two lines is not that they differ, but that they agree as well as they do. It is really remarkable that any marked peculiarity in the use of words is almost sure to be revealed in this way, even in comparatively small groups. In the two diagrams of Fig. 1 it is interesting to note their general sameness, especially as shown in a tendency to equality of words of six and seven letters and also in words of eleven and twelve letters.

When the number of words in each group is increased there is, of