The Block-Design Tests
BY S. C. KOHS
A brief presentation of the Block-design Tests will be attempted in this article. These tests fall in the category of 'performance tests' and have been standardized to measure intelligence. They have been purposely devised to eliminate the factor of language. In this attempt the writer believes he has been especially successful since the instructions themselves may be given entirely through pantomime and imitation.
There has indeed been, and there still is, a great need for tests such as are here presented. In the longer monograph which the writer is preparing for publication, there will be more detailed treatment of many topics, such as the definition of intelligence; an analytic criticism of current methods of standardization; suggested newer statistical procedure; the relation between language ability, performance and intelligence; and other pertinent material.
The content of the present article has been divided into six sections:
(A) The Test Material:
- 1. The Blocks.
- 2. The Designs.
(B) The Directions for Applying the Tests:
- 1. For Subjects Who Can Understand Spoken Language.
- 2. For Subjects Who Do Not Know the Names of the Colors.
- 3. For Subjects Who Cannot Understand Spoken Language.
(C) The Score Card and Methods of Scoring.
(D) The Norms.
(E) The Reliability of the Tests.
In the promised monograph more complete details will be presented which would be out of place in this brief article.
The Block-Design Test.
(A) THE TEST MATERIAL
I. The Blocks
The Blocks which are used are manufactured by the Embossing Co., and may be secured at any of the large department stores and at various distributing centers of Milton Bradley's. There are sixteen cubes of one inch dimension and all are painted as follows:
- One side red
- One side blue
- One side white
- One side yellow
- One side blue and yellow (divided on the diagonal)
- One side red and white (divided diagonally)
The character of the colors is indicated on the page of designs (pp. 360−1) in this article. A slight difficulty experienced by possibly one or two subjects out of every 100 was a just perceptible but nevertheless disconcerting difference in shade between the blue and yellow on the full faces and the same colors on the diagonal sides. This can be remedied in the later standardization of the test material. One set of the blocks will last through the examination of from four to five hundred children without showing much wear and tear. After that the cubes can be repainted without difficulty.
It is interesting to watch the response of children and even adults when they are given colored cubes to handle. There is no doubt that an appeal exists which touches the roots of some very fundamental original tendencies. Of all the subjects tested, not one has manifested any absence of a desire to combine these cubes in some fashion. The experimenter needs only to direct this natural interest toward a specific end and then apply a scientific measuring technique to evaluate the results.
2. The Designs
In Chart I. the seventeen designs utilized in this test are represented. The Arabic numerals designate the final numbering of each design. The original number was 35 but fifteen were eliminated in a few of the early preliminary testings. The designs are graded in difficulty which increase by modifying the designs at various stages in the following manner:
- 1. By the use of the full colors;
- 2. By the use of few diagonaled sides;
- 3. By the use of all diagonaled sides;
- 4. By turning the design on one of its corners;
- 5. By eliminating the outside boundary line;
- 6. By increasing the number of blocks to be used;
- 7. By increasing dissymmetry in design;
- 8. By decreasing the number of different colors used in each design.
Chart 1. Block Designs.
- Designs: 1, 2, 3, 4, 5, 6, 7, 8, 9—4 blocks.
- Designs: 10, 11—9 blocks.
- Designs: 12, 13, 14, 15, 16, 17—16 blocks.
- Colors (Windsor & Newton, Ltd.): red = carmine lake; blue = prussian blue; yellow = pale cadmium.
To perform the test, utilizing twenty designs, one averaged about an hour or an hour and a half. In the final revision three designs have been eliminated, leaving seventeen, thus decreasing somewhat the time necessary to apply the tests with no significant decrease of reliability. The criteria for rejection were based on correlations with those arrays of evidence presuming to yield an index of intelligence, such as is obtained through the use of the Binet Scale, as also upon the basis of the diagnostic value of each design determined by the progress of its curve with increasing chronological age. The results at present indicate that the block designs are as good as any single test in the Binet scale (though better in the sense of diagnostic value), as good as the Trabue Language Completion Tests, or any other similar single type test, whether involving the use of language or whether mere performance.
The designs, appropriately colored, are printed on medium thick, white, semi-gloss cardboard. The dimensions of the card are 3 by 4 inches. The printed designs, placed in the center of the card, are one fourth the size of the actual designs when the cubes are used. In other words, the face of a cube represented on the designs is only one half of an inch on each of its sides. Thus design no. 1 is one inch square, design no. 10 is one and a half inches square, and design no. 14 is two inches square.
The writer has found it of assistance to place in the lower right-hand corner the time limit for each design. These values follow:
|Design (Number)||Time Limit (Minutes)||Design (Number)||Time Limit (Minutes)|
The time limit as set for each design is about one minute longer than the time within which a correct response may reasonably be expected.
It may be of interest to remark that if the full limit is allowed on each test the working-time totals only 45 minutes for all of the seventeen designs. With practice an examination should average about thirty to forty minutes. In some cases it may take only fifteen or twenty minutes, in others perhaps an hour.
(B) The Directions for Applying the Tests
Seat the subject comfortably at a table, noting that his visual angle when working with the tests is not less than 45 degrees. Be sure that no designs are visible in your preliminary instructions, nor more than a single design at any one time. The blocks which are not being utilized should be kept in a box, apart, so that they are either invisible to the subject, or if visible, the blocks should be arranged so that the top sides are all of the same color.
Method: Part 1. For Subjects Who Can Understand Spoken Language
(Section A) Take a block. (Instructions to subject are placed in quotation marks. For Design 1 four blocks will have been removed from the box.) "Here are some blocks,—give me the name of the color on this side." Sides with the full color are presented first. Place your finger on the side designated. After the subject has responded, turn to another side. "And what is the color on this side?"—"Now the color here?"—"And what is the color here?"—If the subject has succeeded in naming the colors correctly, proceed with the experiment. (If he has failed, further instructions are given below, in Part 2.) Then the experimenter explains: "Now on this side we have blue and yellow, (point), and on this side red and white (point). And all the blocks are painted in the same way."
(Section B) "What you are to do is this: Take these blocks," (Shuffle them so that when finally placed before the subject, no more than one fourth of the blocks have topside colors which are present in the design, the separate blocks being placed apart, flat on table, and not piled one on top of another) "pick out the right colors, put them together, and make them look, on top, just like this." (Point to design 1.)
Give no further hints nor suggestions if the directions have been understood. Caution: Be sure that all the blocks are thoroughly shuffled before the design is presented. The purpose is to eliminate the possibility of studying the design before being ready to begin work with the cubes.
(Section C) If the subject has not understood what is meant, the experimenter may perform trial design (A) slowly, using pantomime freely, the subject watching closely, after which the subject is requested to repeat the operation. This may be repeated any number of times until the subject understands. When he does, proceed with the designs in order, beginning at (Section B), and continuing with (Section D).
(Section D) After the first design has been completed or failed, the blocks are again shuffled, observing the cautions in (Section E), and the subject is told again to "Take these blocks, pick out the right colors, put them together, and make them look on top just like this." (Point to design number 2.) The instructions remain the same for all the designs. The subject is not told at any time the number of the blocks he is to use.
Record.—Both time and moves are recorded. A move is counted when a block is given its initial position on the table. Each separate and distinct change in the position of a block is counted a move. Sometimes a child will make three or four changes in the position of a cube, the topside remaining the same color (especially true of diagonal sides, e.g., red-white). But each change in position is counted a separate move. If success is not attained within the time limit, no credit is assigned. The time limits are indicated on the design cards.The whole test is not regarded as complete unless there are, ordinarily, at least five consecutive failures on designs after the last success, and where doubt exists as to inability in the later designs, give as many designs beyond the last success as is deemed wise.
Part 2. For Subjects Who Do Not Know the Names of the Colors
Take all the blocks out of the box and place on the table so that the single-colored faces are all on the top side of the cubes. Have an equal number of reds, yellows, blues and whites. Point to a red-topped block and ask the child to point to all the blocks that have the same color on top. Do the same for the other three colors. If the child can distinguish the colors, proceed with the test at (Section B).
Part 3. For Subjects Who Cannot Understand Spoken Language
By means of gestures and pantomime go through the procedure in Part 2. If the subject can distinguish the colors, proceed with (Section C), and through the various designs. The method of recording remains the same.
(C) the Score Card and the Method of Scoring
In the following table are presented the score values of each of the seventeen designs and the number of score points to be deducted if a design is successfully completed with excess time and with excess moves:
|Design No.||Score Value||Points to be Subtracted|
|1 Point||2 Points||1 Point|
|1||3||21" and over||————||6 and over|
|2||5||31" and over||————||7 " "|
|3||6||21" to 35"||36" and over||8 " "|
|4||6||31" to 1' 0"||1' 1" " "||10 " "|
|5||7||36" to 1' 5"||1' 6" " "||11 " "|
|6||7||36" to 1' 0"||1' 1" " "||12 " "|
|7||7||41" to 1' 10"||1' 11" " "||11 " "|
|8||8||41" to 55"||56" " "||10 " "|
|9||9||56" to 1' 10"||1' 11" " "||15 " "|
|10||9||1' 56" to 2' 10"||2' 11" " "||22 " "|
|11||8||1' 46" to 2' 30"||2' 31" " "||19 " "|
|12||9||2' 26" to 2' 40"||2' 41" " "||30 " "|
|13||9||2' 21" to 2' 33"||2' 34" " "||31 " "|
|14||9||2' 26" to 2' 40"||2' 41" " "||32 " "|
|15||9||2' 41" to 3' 0"||3' 1" " "||32 " "|
|16||10||2' 41" to 3' 5"||3' 6" " "||31 " "|
|17||10||2' 41" to 2' 55"||2' 56" " "||30 " "|
|Maximum score—131 points.|
To clarify the table, one or two illustrations will be utilized. For example, design number two has a score value of 5. This full amount is attained if a reagent completes the design successfully in less than 31 seconds and with less than 7 moves. If 31 or more seconds are utilized, one point is deducted from the score, and if 7 or more moves are made an additional point is deducted. Take again design number thirteen which has a score value of 9. This full amount is attained if the subject completes the design successfully in less than 2 minutes and 21 seconds, and with less than 31 moves. If completed between 2 minutes 21 seconds and 2 minutes 33 seconds, one point is deducted, if 2 minutes 34 seconds or more are spent on the problem, two points are deducted. And if 31 or more moves are made an additional point is deducted from the score value of the design.
The scoring of a performance is a very simple matter. This will be self-evident from the following examples:
Example one: Design number 7 successfully completed in 1 minute and 23 seconds and at the end of 9 moves. Score 7, for successful completion, less 2 points for excess time. Final score 5. Example two: Design number 10 successfully completed in 1 minute 48 seconds, and after 19 moves. Score 9, for successful completion. No deductions for excess time or excess moves. Final score, 9. Example three: Design number 16, successfully completed in 3 minutes 27 seconds, and after 48 moves. Score 10, for successful completion. Deduct 2 points for excess time, and one point for excess moves. Final score 7.
It may be worth remarking that successful performance, speed and what may be termed accuracy are all combined in the final score. Successful performance receives greatest weight, speed next and accuracy next. The weight ratio as explained elsewhere in the monograph is roughly 4 : 2 : 1. This ratio has been empirically determined and was not derived by arm-chair philosophizing. The prevalent opinion, which was at one time shared by the writer, that speed and accuracy cannot be combined in one score, does not hold with the Block Design Tests. The writer felt that success, speed and accuracy each had its own diagnostic importance and in order to make the tests most effective all should and must be taken into account in the final score summation. But more of this in the longer monograph.
(D) THE NORMS
The procedure involved in obtaining norms for the different designs was quite a complicated one, requiring a great deal of careful statistical work. In this effort the writer utilized the currently accepted standardization methods, with but slight modification. An explanation of the general pro-
Graph I. Mental Age Equivalents of Score Points.
cedure utilized together with a description of various methods of checking the results has been left for the later monograph. Suffice it to say that the score points mentioned in Table II. are to be interpreted in the same light as those of Buckingham in his standardization of his spelling tests, of Trabue in his standardization of his language-completion tests, and of Woody in his standardization of his arithmetic tests. In this section the final results, merely, will be presented.
Graph I. is the curve indicating the scores to be expected at the various ages from 3 years to 19 years. This curve has been smoothed but slightly within the range of ages below ten, though rather considerably from fifteen to nineteen. This was necessarily the result of a deficiency in the number of cases at the higher ages. The median score at each age is represented by a circlet with a dot enclosed.
|Score Points||Mental Age||Score Points||Mental Age||Score Points||Mental Age||Score Points||Mental Age|
|0||5-3 or below||33||10-9||66||13-5||99||15-9|
In Table III. are presented the mental age equivalents of each score from 1 (mental age 5 years 7 mos.) to 131 (mental age 19 years 11 mos.)
(E) RELIABILITY OF THE TESTS
To measure the reliability of any newly devised test of intelligence is not a simple matter. It devolves upon the standardizer to present evidence that the new intelligence scale measures this inadequately defined entity 'intelligence' with approximately the same degree of accuracy as those standards or measuring 'rods' now commonly accepted and in current use.
In this brief article the writer will limit himself to five criteria:
- (1) The mental processes employed;
- (2) Increase in score from year to year;
- (3) Correspondence of median mental ages;
- (4) Correlations between mental ages, intelligence quotients and teachers' estimates of intelligence;
- (5) Conformance of intelligence-quotient distribution with normal probability.
(1) Mental Processes Employed
In devising and standardizing this test the writer did not approach the problem with any bias of 'faculty psychology.' The idea still seems prevalent, though not as much now as in the immediate past, that in order to possess an adequate measuring instrument for intelligence, the device must contain separate tests for each mental 'function': sensation, perception, association, imagination, memory, judgment, reasoning, etc. On the other hand it has been amply demonstrated that the only intelligence scales worth the name draw service freely from all 'functions.' Binet has pointed out that all 'intelligent' operations involve the functioning of three primary activities: first, attention to the problem presented; second, a conscious attempt on the part of the subject to consummate an adequate adaptation to the situation; and third, the exercise of auto-criticism in order to determine how efficiently the specific 'adaptation' has solved the problem. A cursory examination of the demands made upon the mental operations of the person tested with the block-designs will clearly reveal that attention, adaptation and auto-criticism are all involved in the successful accomplishment of each task. That point in the graded series of designs at which a child will begin failing to achieve further success, will be a rough measure of the development of his ability to attend, to adapt and to critically survey his general plan of performance and his ultimate accomplishment. In his discussion of the 'patience test' in the 1908 scale, and these words might as well apply to the block-design tests, Binet states: "It is a game, but at the same time a work of the intelligence. When one analyzes the operation it is found to be composed of the following elements: (1) Consciousness of the end to be attained, that is to say, a figure to be produced; this end must be understood, and kept in mind; (2) the trying of various combinations under the influence of this directing idea, which often unconsciously determines the kind of attempt which should be made; (3) judging the combination formed, comparing it with the model, and deciding if it resembles the other" (p. 198). If 'intelligence' involves the following mental operations; analyzing, combining, comparing, deliberating, completing, discriminating, judging, criticising and deciding, then the block-design tests may, with justice, be said to call upon the functioning of intelligence and to that extent they are a measure of that mental capacity.
(2) Increase in Score from Year to Year
As regards the second criterion, reference to Graph I. and to the various tables presented in this article will clearly demonstrate that this requisite is satisfied. The following, however, should be mentioned: At each life age a greater scatter or range in ability is noticeable than is the case with the Binet tests. Whether this phenomenon argues for reliability or not is left for discussion in the later monograph.
(3) Correspondence of Median Mental Ages
At each life age do the median mental ages obtained by the block-design tests correspond with the median mental ages obtained by the Binet tests? This question is an important one, and the extent of correspondence or deviation should measure very largely the reliability of the newly devised tests.
In the following table this comparison is presented:
|Life Age Yrs.||No. of Cases||Median Binet Age Yrs.-Mos.||Median Block-Design Age Yrs.-Mos.||Difference Between Medians (Mos.)||Average of Two Medians Yrs.-Mos.|
Four important items are worthy of note: In the first place, the average deviation of the median Binet ages from the life ages at each year is 6 months; second, the average deviation of the median block-design ages from the life ages at each year is 8.8 months; third, the average deviation between the two intelligence-test medians is 8 1/2 months, and finally, the arithmetic mean of the two medians for each life age results in a more accurate approximation of what may be the 'true' mental age than either median taken alone. The significance of the last fact will have to be left for more complete discussion in the later monograph. At this point, it may be sufficient to remark that the approximation between the Binet and block-design medians is rather close, especially when we consider that the block-design tests are quite free of the 'language factor.'
(4) Correlations between Mental Ages, Intelligence Quotients and Teachers' Estimates of Intelligence
In order to understand and to justly evaluate the relations about to be presented, the Binet results will be mentioned to serve as a standard of comparison.
- 1. The correlation between Binet age and life age is + .80 (P.E. ± .01) (291 public school cases).
- 2. The correlation between block-design age and life age is + .66 (P.E. ± .02) (291 public school cases).
- 3. The correlation between Binet age and block-design age is + .82 (P.E. ± .01) (366 cases). The table is herewith presented:
|(Note: Age 10 means 10-7 to 11-6 etc.)|
- 4. The correlation between Binet age and block-design age is + .81 (P.E. ± .01) (291 public school cases).
- 5. The correlation between Binet age and block-design age is + .67 (P.E. ± .05) (75 feebleminded cases).
- 6. The correlation between Binet I. Q. and block-design I. Q. is + .80 (P.E. ± .01) (366 cases). The table is herewith presented:
7. The correlation between Binet I. Q. and block-design I. Q. is + -57 (P.E. ± .03) (291 school children).
8. The correlation between Binet I. Q. and block-design I. Q. is •+- .67 (P.E. ± .05) (75 feebleminded cases).
9. The correlation between teachers' estimates of intelligence and Binet I. Q. is + .47 (P.E. ± .03) (291 school children).
It may be worth remarking that although the correlation between block-design age and Binet age is + .82, teachers' estimates of intelligence correlate only one half as much with the Block-design I. Q.'s as with the Binet I. Q.'s. The reader may recall that one of the original objections to the Binet scale was that it measured school training. Only to a limited extent has this been denied, the explanation having been made that the tests measure intelligence through the medium of knowledge only partly influenced by school training. It has been admitted, true, that practically all children are exposed to these educational influences, but the ultimate difference in achievement is explainable on the basis of differences in endowment. However this may be, the results of the block-design test would perhaps tend to show that there is more to this charge than we have been inclined to admit. It will, no doubt, be acceded without much question that the block-design tests are less affected by school training than the Binet.
At any rate the total correlational evidence seems to indicate that the block-design tests possess a high degree of reliability.
(5) Conformity of Intelligence Quotient Distribution with Normal Probability
A very necessary index in weighing the reliability of any standardized test is to determine the extent to which an actually found distribution conforms to its theoretical distribution.
In the following table are presented the I. Q.-range distributions for the Binet and the block-design tests. The respective percentage values are compared with what one should theoretically expect.
(Median at 99)
The average deviation from theoretical expectation for the Binet I. Q. ranges is 3.3 per cent, per I. Q. group. The average deviation for the block-design tests is only 1.4 per cent, per I. Q. group.
In conclusion, one may state that the evidence presented seems to indicate not only that the tests measure intelligence, but that this is accomplished with a fair degree of accuracy. On the other hand, one should bear in mind Stern's caution: "Psychological tests must not be overestimated, as if they were complete and automatically operative measures of mind. At most, they are the psychographic minimum that gives us a first orientation concerning individuals about whom nothing else is known, and they are of service to complement and to render comparable and objectively gradable other observations—psychological, pedagogical, medical—not to replace these."
In his 'Stanford Revision of the Binet-Simon Scale' (Warwick and York, 1917) Terman states (p. 150) that "to be widely serviceable a test should demand only the simplest material or apparatus, should require at most but a few minutes of time, and should lend itself well to uniformity of procedure in application and scoring." The writer has attempted to satisfy these demands in standardizing the block-design tests. Those who utilize the tests will find after a little practice that there can be but little variation in the findings of two examiners, and that the only chance for difference is in the recording of the number of moves made.
The special value of the block-design tests lies in the fact that valid results may be obtained independently of the 'language factor.' Neither deafness nor lack of language understanding should be disqualifications in the proper performance of the test. The block-designs may therefore be utilized in the study of racial differences, in determining the mental capacities of the deaf and of those suffering from various other language handicaps.
As regards the borderzone problem, although further investigation of this matter by the writer is now under way, it seems that this test will aid in a better differentiation of the group of cases falling in this category. The writer maintains that feeblemindedness is not an arbitrary statistical designation, but is rather a clearly demarked physiological entity quite distinct from normality, statistical-psychologists notwithstanding. Years of experience with this type of defect has fixed the notion in the writer's mind that feeblemindedness is indicative not only of mental mal-functioning, but also of physiological mal-functioning, especially of endocrine character. The results of further research, however, can be the only tests of the truth of one's statements at this time.
The writer regrets the omission of much pertinent material in this brief presentation, but the later monograph will deal with many topics here barely touched upon, if at all.
Regarding the Average Mental Age of Adults
Of importance in interpreting the results of this newly devised mental test is the recently raised question regarding the average mental age of adults. In the promised monograph a few pages will be devoted to a psychological and statistical discussion of this important matter. At this point the writer merely wishes to state that the data so far presented does not warrant accepting the suggestion that "The previous notion that the average intelligence of adults is 16 years should be given up." There is a fundamental fallacy underlying the suggested 13 to 14 year criterion, a complete discussion of which must be left for a later time.
- Psychologist to the Court of Domestic Relations.
- Trial Design (A) is represented on the pages with the other designs and Is used only when under the provisions of Section C further preliminary explanation is necessary. Trial Design A is a four-block design, two full red sides above, two full yellow sides below.
- The Development of Intelligence in Children, Publication No. 11, Vineland, 1916.
- Pearson's Coefficient (r) used throughout.
- 'The Psychological Methods of Testing Intelligence,' Warwick and York, 1914, p. 12.
- E. A. Doll: New Jersey State Prison, Psychologist's Report (1918-1919), p. 72.