Popular Science Monthly/Volume 65/August 1904/Shorter Articles and Discussion
CHARACTERISTIC CURVES OF COMPOSITION.
In the June number of this journal there appeared an interesting article by Dr. Robert E. Moritz, on 'The Significance of Characteristic Curves of Composition,' mostly devoted to an examination and criticism of some conclusions stated by me in a paper published nearly twenty years ago and practically applied in another paper published in 1901. To those who have had enough interest in this somewhat curious application of the doctrine of chance to read all of these papers carefully no comment upon or reply to the criticism of Dr. Moritz need be addressed, but in these piping times everybody is so busy preparing his own papers for the press that he has time only to glance at the results of the intellectual activity of others, and it has become a common, indeed, almost necessary habit to make a hurried hunt for the conclusions of scientific investigations of a subject a little out of one's own field and to accept them when found for lack of time to do otherwise. For this reason I will invite attention to one or two facts having an important bearing upon the question at issue. The assumption of Dr. Moritz is that the form of what I have called the characteristic curve of a composition, plotted as first described twenty years ago, will depend more on what he calls the form of composition (character, as to subject matter, etc.) than upon any personal peculiarities of the author. He believes this, the form of composition, 'to be the predominating factor overshadowing all others' and that 'conclusions regarding the authorship of spurious or disputed writings based upon a comparison of the word curves of work differing either in form (if composition or in other essential respects must be considered worthless.' After (not before) making these and other equally sweeping assertions, he sets forth the evidence by which lie believes they are supported. The principal part of this evidence is an exhibition of results of a series of 'word-countings' of various authors which he has made, from which results he deduces the conclusions quoted above.
Unfortunately these conclusions are of no value whatever because the observations on which they are founded are totally inadequate and, indeed, are specifically 'ruled out' in the very beginning by the author himself in a quotation from my earlier paper. In this it was declared that a count of '100,000 words would be necessary and sufficient to furnish the characteristic curve of a writer,' and yet, in the face of this statement, Dr. Moritz proceeds to make his sweeping deductions from groups including 1,000, 5,000 (generally) and in the case of one author two groups of 1 5,000 words each! He puts the curves of the two latter, including only 30.000 words in all, by the side of the Bacon-Shakespeare diagram which includes 600,000 words (not less than 100,000 being 'necessary') and then makes the charming comment upon the latter that, 'instead of furnishing a convincing proof or even contributory evidence, leaves the problem of disputed authorship wholly untouched!' In this case the value of the evidence depends on some power higher than the first of the number of words, but even if directly proportional it would be twenty in favor to one against, and it is difficult to believe the author serious in condemning so positively and confidently the evidence of a 'characteristic curve' when he is so very, very far from ever having made one. But serious he seems to have been and perhaps never more so than when he declares that the 'average word length alone. . . would, in general, be indicative of the nature of the curve.' This is equivalent to saying that the form of a curve is known when its mean ordinate is known, and is a statement which, to those who are accustomed to the graphic representation of variables, will betray an almost immeasurable unfamiliarity with problems of this kind. Among other evidences of this state of mind which might be cited, the construction of a 'typical word-curve of extreme light dialogue'—from a count of 5,000 words from Swift's 'Polite Conversation'—is not the least convincing. To produce this Swift's curve is 'corrected' by the suppression of certain words of seven or eight letters, for no assigned or imaginable reason, except that perhaps Dr. Moritz thinks that Swift ought to have known better than to have used them. The curve of this expurgated edition of 5,000 words from Swift is interesting in form, but if it be the 'typical word-curve of extreme light dialogue' in the English language, as declared by Dr. Moritz, those who have dabbled, even a very little, in word-counting of modern comedy and humorous story writers will be saddened by the thought that the art of composing 'extreme light dialogue' must have long ago become extinct.
It seems impossible to avoid the conclusion that Dr. Moritz, perhaps as a result of a somewhat hasty examination of the subject, has failed to grasp in its entirety the fundamental principle on which the whole doctrine (if so dignified a term may be used) of 'characteristic curves of composition' is based, and a brief exposition of its most important propositions may not be out of place.
The notion that every author, however voluminous, must necessarily be restricted in his use of words to a vocabulary which would remain sensibly constant after his productive period had been reached, which, in its character and extent would be one of the personal 'qualities' of that author and thus offer a means of identification, is due, as is stated in the paper of twenty years ago, to Augustus De Morgan, who suggested that vocabularies might differ so much among different authors as to make it possible to differentiate them by means of the simple average number of letters in a word. In making some tests of this proposition it immediately became evident, as might have been anticipated, that vocabularies might differ indefinitely and enormously and at the same time agree in average word-length. The scheme for the graphic display of variations in the average frequency of occurrence of words of different lengths, as explained in the papers under discussion, was then devised and proved to be a vastly more powerful means of revealing peculiarities in composition. As to the value of this method of treatment, which is the one original feature of the whole, there seems to be no question, as even my critic has paid me the highest compliment in his power in making continued and apparently confiding use of it. The point at issue is, rather: Was De Morgan right in assuming that the personal element enters into the vocabulary of any author to such an extent as to furnish a means of identifying his writing? He evidently believes that it played so large a part as to determine the average length of words used; the theory of 'characteristic curves' implies that personality may determine the way in which words are used rather than their average length, and it furnishes a method for revealing peculiarities, such as persist in the long run (this is the kernel of the thing) in the relative frequencies of words of different numbers of letters, syllables, etc., of sentences of different lengths or of any 'qualities' that may be treated numerically. Because of simplicity, ease of application and probable greater certainty of result, the element of word-length was that to which attention was first given. Besides, there is in this another important advantage which will be presently explained.
Now, in spite of an oft-quoted assertion to the contrary, words are used to express ideas, and the particular words used will depend largely, perhaps most largely, on the idea to be expressed; but they will also depend on the person to whom they are addressed; on the conditions under which they are spoken (as in private conversation, public address, etc.) or written (in correspondence, for publication in newspapers, journals or books); especially on the age or period in which their author lived; and perhaps on a thousand other things; but in any or in all of these cases they are determined by the person who uses them. The theory assumes that by combining a sufficiently large number of verbal expressions of any author, variations due to these thousand and one causes may be eliminated and that due to the personality of the author, the only one in fact which is common to them all, may stand revealed.
Moreover it is clear that in selecting from many personal idiosyncrasies of an author that which will best serve for purposes of identification, it will be wise to choose one of which he is himself unconscious, for this would most certainly persist and prevail in everything he wrote, never being either shunned or encouraged. While an author will often give thought to the arrangement of words and sentences and to other features of composition, he will almost never stop to consider the number of letters in the words which he uses, and, therefore, such personal peculiarity as may be shown by word length frequency curves is almost certain to be persistent.
Dr. Moritz is quite right in his belief that 'form of composition' (subject matter, etc.) affects very power fully the form of a word-curve, seeming, at first, to conceal the element of personality, just us in the physical world local, near-at-hand causes seem in their effects to overshadow and conceal those of remote but more constant origin.
But it must never be forgotten that they only conceal, they do not destroy; they may over-shadow, but they can not obliterate. The position of a freely suspended magnetic needle does not, at first sight, appear to be in any way related to the phases of the moon, but a very long series of observations reveals this relation very clearly and certainly, although local happenings affect it to a much greater degree and apparently conceal the more persistent influence of a more remote cause. It is the constancy of this influence, the fact that it is present all the time, which makes it possible to differentiate it from others, often exceeding it in magnitude, but less regular in operation. And so it is in the long run that the personal peculiarities of an author, especially the unconscious peculiarities, will be revealed, and the so-called 'characteristic curve of composition' was suggested as a means of developing and displaying one of them.
The question under discussion includes, therefore, two parts: First, does the personality of an author enter into his composition so as to affect its purely mechanical aspect in a way of which he is quite unconscious?—and, second, does the 'characteristic curve' furnish a means of exhibiting such peculiarities of composition if they exist? It is not likely that anybody, after a little reflection and investigation, will be willing to say 'no!' to the first, although few people fully recognize how steadily and surely one is influenced by a bias of which one is totally unconscious. Although such unrecognized influences are often very feeble, their very persistency, the fact that they 'never sleep,' may give them so large a place in the final summing up of any series of operations that they determine the distinctive characteristics of the operator and give 'personal quality' to the work itself. About twenty-five years ago, being much interested in problems of this kind (more than at the present moment), I spent many 'odd' and generally otherwise unusable quarter and half hours in pitching a stick into the air and noting whether it fell across any of a series of parallel lines drawn on a plane surface upon which it dropped. This cheerful occupation was continued until the stick had been thrown 20,000 times, and then the number of times it had fallen upon a line was compared with the number indicated by theory. It so happens that according to theory this experiment ought to determine the value of that important constant, the ratio of the circumference to the diameter of a circle, and in the present instance it promised for a time to do this in a most satisfactory manner. Indeed at the end of about 12,000 throws the value of 'π' was determined correctly to three decimal places and nearly correctly in the fourth. But from this time on the graphically constructed line of the experiment began to depart very slightly but very persistently from the line of theory and continued to do so to the end. The explanation was easy, it being evident that the operation which was intended to be purely mechanical was not so. There was present an unconscious personal element which interfered with the regularity of the work, to a very minute degree, it is true, but the effect of which became manifest when the run teas long. The deviation was due to a, perhaps, very rarely occurring error of judgment in determining a single fact of the experiment, but in the long run these errors leaned towards one side, and this was beautifully revealed in the graphic exhibit of the whole series. I do not recommend the process as a means of determining errors of this kind, for it is altogether too laborious, and besides, I have not found it necessary, kind friends having generally kept me well informed as to my errors in judgment. What I want to illustrate and emphasize is the importance of my being unconscious of this bias, which otherwise would have destroyed the value of the whole experiment. It is the 'unconscious touch' which most surely identifies personality in any artistic performance. It is likely that Raphael never meant to paint two Madonnas alike, indeed it is likely that he would generally make some effort to have each different from all that he had done before, but all have something in common, unsuspected by the artist but known to the expert and furnishing a practically sure means of identification. Moreover, these unconscious technicalities of an artist, the key to identification, are most frequently known and utilized by persons who have little knowledge and less appreciation of the real artistic qualities of the works which they compare (see Ruskin on the identification of old masters), the operation being the more certain as it is more purely mechanical. It is within the memory of most of those who will read this that the result of a national election together with the whole character and policy of the national government narrowly escaped being determined by the skillful introduction of a single phrase of only three words into a letter which afterwards proved to be a forgery. So characteristic of the alleged author was this phrase that at first even those who knew him best were reluctant to deny its authenticity. And yet I have excellent reasons for believing that the distinguished statesman whose splendid career was thus imperilled was entirely unconscious of the uncommon frequency with which this phrase occurred in his speaking and in his writing. It is because the scheme of the 'characteristic curve' lends itself to the development in a purely mechanical way of idiosyncrasies of which the author must be unconscious that it is thought to have some value as a means of identification. It may be that this assumption has not yet been proved, but in view of what has been said and even of what was said twenty years ago, it ran not be said to have been disproved. If instead of making deductions from groups of 1,000 to 5,000 words, Dr. Moritz had declared his belief that even 100,000 was too small a number for a perfectly definite characteristic curve the statement would have been well worth consideration; but it is difficult to doubt the evidence of diagrams exhibiting the word curves of several of the principal writers of Shakespeare's time, published in this journal, December, 1901, nearly all of which are based on counts of over 100,000 words each, and especially the very remarkable agreement, amounting to practical identity of the two curves from Shakespeare, each including about 200,000 words; the almost equally close agreement of two curves of 75,000 words each from Ben Jonson; and the striking difference between the latter and the curve of Shakespeare, although the 'form of composition' is the same in both, a fact directly opposed to Dr. Moritz's conclusion from a few groups of 5,000 words each. After the reader has examined the close agreement of these large groups from the same author, he may consider the contrasted curves of Bacon and Shakespeare, representing the counts of over a half million words, and, as far as I am concerned, he is still 'at liberty to draw any conclusion he pleases.'
Dr. Moritz's studies of the influence of 'form of composition' on the word curve are instructive and it is to be hoped that he will have the patience and courage to continue them. When his word counting, instead of including only a few thousands, shall have reached a million or two, and these of not more than a half dozen authors, what he may have to say upon the subject will be listened to with interest.
|T. C. Mendenhall.|
|June 24, 1904.|
- It is interesting to compare diagrams 8, 11 and 14 of his paper, to note the general agreement of the two curves for each author, the general and, indeed, striking differences among the three authors (which would have been much more evident in means of the several pairs) and to inquire if he has even correctly interpreted his own diagrams?