The Multiple-choice Cloze Test as a Placement Examination: A Critical Analysis

Tom Mathews http://cc.weber.edu/~tjmathews/home.htm

The Multiple-choice Cloze Test as a Placement Examination: A Critical Analysis Thomas J. Mathews Weber State University Published in Delaware Papers in Linguistics 1(1), 38-53 (1988)

0. Introduction

In this paper I will look at the current foreign language placement exam given to prospective Spanish students at the University of Delaware. I describe the exam and its functions in sections 1, 2, and 3. In section 4, statistical date are shown bearing on the performance of the exam. Finally, in sections 5 and 6, I make suggestions based on these finding and on the literature for improvement in such placement exams.

1. Purposes of a placement exam

In the American university system the great majority of incoming students have already completed at least two years of foreign language study. Indeed, most universities require two years of most applicants. At the same time, the university degree requirements, at least in the Liberal Arts areas, require the equivalent of two years of foreign language study at the university level. In most cases, any previous foreign language study will count toward fulfillment of this requirement; yet, because high school curricula are less intensive than those found in universities, and because they vary greatly from district to district or teacher to teacher, some instrument is necessary to determine how much foreign language study any particular student needs in order to complete a university foreign language requirement.

This instrument might be as complicated and time consuming as some sort of oral interview procedure, or as simple as placing students according to the number of foreign language hours completed and high school grades. Whatever type of test is used, it must be able to accurately predict the ability of students to perform in the foreign language, and place them at an appropriate class level.

This helps students by optimizing their resources at the university. They can save both time and money by placing into the highest possible level language class. It also helps each class by creating a more linguistically homogeneous group of students.

2. Attributes of a placement exam

A placement exam needs to be constructed in such a way that it can be administered quickly and efficiently. It must also be remembered that because such test are administered to freshman students, often before they acquire any real college experience, test affect may be unusually high. The test needs to be short because the orientation periods for incoming students do not permit a great deal of time for testing. The format of the test should also allow for expeditious evaluation. Multiple-choice is undoubtedly the optimum exam format: it can be administered by someone who does not speak the language being tested, and it can be corrected almost immediately.

A placement exam should also accurately place all students regardless of the amount of time that has elapsed since they last studied the language. That is, it should be able to predict whether a low score is the result of forgotten material that will be easily remembered, or whether the material was never really learned before. This is no easy task, and in fact, may not even be possible. I fell, however, as shown in section 4.2.2., that if we know beforehand what a student's foreign language experience is, specifically how much and how long ago he has studied, we can make a better prediction based on a test score than if we ignore such background information.

3.0. Assumptions of a placement exam

A placement exam assumes that students taking it will want to do well. That is, that they will try to attain the highest possible number of correct answers. The exam also assumes that the number of items it contains is adequate to place students into the necessary levels. Those two assumptions are in fact tacit in most testing situations. However, in the discussion of placement exams, they are very often inaccurate. This is especially true of the first: a great number of students, as I will show later, purposely do poorly on the exam.

3.1. Student reasons for studying foreign language

The first assumption above presumes that students will try to do as well as they can on a placement exam. Yet for a variety of reasons, it has become rather obvious that many students go out of their way to do poorly on the exam in order to be placed in a low-level foreign language class. They may do this for a variety of reasons. First, a great number of students, through some sort of misguided modesty, believe that they learned less than they actually did in their previous foreign language study. They therefore feel that it is in their best interest to repeat elementary classes. Second, their apprehension of the university environment and anticipation of academic standards much higher that what they have been used to lead them to believe that although a class is labeled 'introductory' or 'elementary' and numbered in the 100 level, it will be far beyond their ability. They frequently persist in this belief even after a placement exam shows that they belong in such a class. Third, many students place great emphasis on the continuity of their program and want to begin their foreign language course of study at the beginning just to make sure they don't 'miss' anything. Last of all, a certain number of students are after the notorious 'easy A'.

During the initial week of a first semester Spanish class in which the author was the instructor, students were required to write a short essay, in English, about their motivations to study Spanish. They were encouraged to be honest in their responses and were told that negative responses would not affect their grade on the essay. Several of their remarks follow.

One student, who had studied two years of Spanish in high school, but had not taken a foreign language class immediately upon entering the university wrote that 'because of the year and half lapse, I have decided to take Spanish 101 so I won't be over my head anywhere else.' Another student wrote something similar: 'Because learning a new language requires a lot of practice, I feel that it is best for me to start off in Spanish 101 since I have not been involved in a Spanish class for three years.'

Another student had studied three and one-half years in high school and had previously taken 101 at the University of Delaware, yet was repeating the course on a no-credit basis. He wrote: 'The reason that I am taking SP 101 again this semester is to refresh my memory in Spanish before I move to higher level Spanish courses.' The fact that this student is willing to pay for the opportunity to repeat a course, based on his confessed lack of confidence, indicates that there is a strong student compulsion to start a university foreign language program at the beginning. This same student when taking the placement exam the first week of the semester scored sufficiently high to have been placed into SP 102 (second semester of the elementary level).

Other students want to begin in 101 because they feel unprepared from their high school experience. One wrote: 'My high school years were crazy. I was a terrible student. I hated Spanish and didn't learn anything.' Another student wrote: 'I took the placement exam for Spanish and I was knocked off my feet—I scored 5 out of 50. So, now I am starting over with a low retention level.' Interestingly, this student (number 25 in the appendix) scored 12 on the exam during the first week of class, and one month later took it again and scored an impressive 23, which would have placed her in third semester Spanish. Yet another student with four years of high school Spanish wrote: 'Although I did not want to take Spanish wholeheartedly in high school, now, in college, I really want to learn the language.'

4.0. The current exam at the University of Delaware

The placement exam that is currently given at the University of Delaware is a multiple-choice cloze test with 50 blanks selected using a fixed-ratio for the deletion of every sixth word. Six different tests are used for the languages offered at the University (Spanish, French, German, Italian, Russian, and Latin), yet, for the purposes of this paper I will discuss only the Spanish exam. The other exams are similar in most respects, concentrating on only the Spanish exam may provide a clearer perception of the virtues and problems associated with the exams as they now exist.

4.1. A description of the exam

The Spanish exam is based on a narrative text which, including the instructions, fits on two sides of a single printed sheet. Each blank, resulting from the deletion of a word, is numbered, and four choices are offered in either the right or the left margin. Students are given a copy of the cloze text, a computerized answer sheet and a pressure sensitive two-copy form to be filled out by the rater with the placement score. After completing the seam, the administrator corrects it immediately and indicates the score and placement on the form. The answer sheet and one copy of this form are kept by the department, and the student is given the remaining copy.

There are two problems with the placement exam as it now exists. The first has to do with face validity. The single sheet of paper upon which the test is printed on both sides is crowded. There are certain advantages that can be gained by having the entire exam fit onto a single sheet, the foremost of these is that it may discourage academic dishonesty on the part of the students. Another advantage is that the integrative quality of the passage that has been made into a cloze is readily visible to the student taking the test. But the test, as it is, is often confusing. The student must remember to switch from one margin to the other when looking for the answers. Even then, the possible answers are printed in small type and are rather close together. Also, because of poor copying and over use (the exam is administered approximately 1,000 times each semester), many individual copies of the exam are faded and portions at the extreme top of bottom of the sheet are cut off. Another factor that affects the face validity is the perceived simplicity and lack of importance associated with a two page exam. It is my opinion that a student who is considering purposely scoring low on an exam, may feel that doing so is less dishonest on a seemingly 'less important' exam.

The second problem has to do with scoring. The cut-off points and probabilities are shown in Table 1. These probabilities predict that a set of random answers will result in a 101 placement just over 72% of the time. The difficulty arises when we see that random answers will result in a 102 placement almost 25% of the time. In real terms this means that one out of every four students will be placed into 102 regardless of their foreign language background.

Table 1 Cut of levels and probability under the null hypothesis

Class Placement

Number of Items Correct

Normal Approximation to Binomial Probability

SP 101

SP 102

SP 111

SP 112

SP 200+

0-14

15-19

20-25

26-30

31 or more

.7422

.2468

.001

The fact that the placement exam is a cloze test may lead us to believe that the items on it are dependent: that is, knowing the correct answer to any particular item depends to some extent on knowing the answer to the items that precede it. Indeed, in standard cloze tests this is the case. But because this exam is multiple-choice, the probability remains the same on any single item, and correspondingly, on the test as a whole. That is, the odds of arriving at any correct answer are always one in four, regardless of how many other items have been answered correctly. Certainly, a student who considers the global context of the test will perform at a much higher level than that of random probability.

The solutions to these two problems, that of face validity and serendipitous placement into a higher level than is appropriate, may not be very complicated, and some suggestions to remedy the situation are presented later in this paper.

4.2. Statistical observations

During the fall semester of 1987 the placement exam was administered several times to three introductory Spanish class (SP 101) at the University of Delaware. Each class had approximately 30 students in it. Most of the students had already taken the exam once prior to the first day of class. I refer to this exam as the Pre-test. In one of the classes the exam was given a second time, on the first day of class, to all of the students who had previously studied Spanish. I refer to this as Test 1. Five weeks later, the exam was given to all students in all three classes and this is called Test 2. During the same week, each student was given two additional exams, a 15 minute oral exam—which included reading aloud and an interview—and a one-hour written exam—including a dictation, a reading, and a short composition. The scores for these two exams—the oral and the written—were combined so that correlations between them and the placement exam could be calculated. This average score is referred to as Grade. Other relevant data gathered included how many years of Spanish each student had studied and how long ago they last studied the language.

4.2.1. Prediction of student performance

Two kinds of statistical analyses were performed on the data. The first deals with the variation with the individual variables; the second, with correlations between the different variables. The results of the univariate analyses are shown in Table 2.

Table 2 Univariate statistics for the Pre-test, Test 1, Test 2, and Grade

Pre-test

Test 1

Test 2

Grade

Sample size

Mean

Stand. Dev.

Range

11.24

2.75

12.83

3.24

15.15

4.50

86.87

9.88

Test 1 was given with the hope that the students might do significantly better, because they knew that a high score would not result in their having to enroll in a more difficult class. But the results of Test 1 were very similar to those of the Pre-test. The means of these tow tests are certainly within the range that one would expect in a 101 class since the cut-off point for that level was 15, but they are also suspiciously close to the mean probability on the entire exam, which is 12.5. In light of this, the mean score of 11.24 for the sample group on the Pre-test can hardly be very meaningful. If anything, the fact that the actual mean is lower than the probably mean under the null hypothesis may indicate that the students purposely scored low.

Table 3 shows correlation statistics done on several of the variables. One of the questions of most immediate concern is the possible presence of a correlation between the Pre-test and the Grade (shown as 3 on Table 3). Of the total population of 87 students, 50 of them took the pre-test. Those that did not, had either never studied Spanish before, or had failed to attend one of the freshman orientation sessions at which the test was administered. The correlation coefficient for the Pre-test and the Grade—where grade was computed for only those individuals who had taken the pre-test—is 0.23 (p > .1), indicating that there is indeed a low correlation but that it is not significant.

Table 3 Correlations: test grade, years of study

Variables	Correlation
Test 2 / Grade Years / Grade Pre-test / Grade Pre-test / Years Pre-test / Test 2	r = 0.4032* r = 0.3022* r = 0.2284 r = 0.2096 r = 0.1164**

* p < .01
** p > .1

The correlations shown in Table 3 are ordered from high to low. The highest correlation (number 1 on the table) was found between the scores on Test 2 and the Grade. Although the correlation of r = .4 is moderate, the probability that such a correlation occurred by mere chance is close to zero. This is interesting but is of little help in solving the problems presented by a placement exam. The students are undoubtedly earning scores that more closely reflect their actual ability, because they have had a month of review of the material. It follows that we might to well to try to place students after such a review. But, although it may make pedagogical sense, it is logistically implausible to try to place students one the semester has begun.

4.2.2. Effects of delays in studies

The second correlation shown in Table 3 is between the number of previous years of Spanish study and the Grade. Although less impressive that the correlation between Test 2 and Grade, it is similar and more importantly, this information is available before the beginning of the semester. It would be fortuitous if, in fact, this kind of information could be used as sole input for placement—it would make a placement exam unnecessary. The correlation shown, however, is probably not sufficiently large to be able to predict performance in foreign language classes with greater certainty that the placement exam currently in use. That is, a relationship certainly does exist, and such a relationship does not exist between the Pre-test and the Grade, but basing placement on this correlation may prove problematic. Without a higher correlation, students will not be convinced that they should risk their time, their grades, or their money, in a class which, for whatever reasons of their own, they feel will be too difficult for them.

One of the problems that would almost certainly be raised by students is the amount of time elapsed between last studying a foreign language and registering for a college class. If placement were to be based on the number of years studied, students who have allowed more time to pass may well feel that they are disadvantaged. Table 4 shows the univariate statistics for these variables.

Table 4 Previous language experience

Years of Spanish

Number of Years Ago

Sample size

Mean

Stand. Dev.

Range

1.91

1.19

1.75

1.56

If the student contention were true, the longer the amount of time that elapses the worse the grades should be. This would imply a negative correlation, one that is certainly not found it the data in Table 4. It is evident that there is no real correlation between the amount of time that passes between previous study of Spanish and student performance, as reflected in Grade (r = .1635, p > .1). This is further evidence that perhaps the number of years of foreign language study may be a reasonable indicator for placement.

4.2.3. Placement in higher levels

Feidy-Bashear (1988) administered a battery of three oral tests to three different groups of students studying Spanish at the University of Delaware. She categorized the students as 'elementary,' intermediate,' and 'advanced' as determined simply by means of the class in which they were enrolled when she administered the tests; that is, the 'elementary' students were enrolled in 102, the 'intermediate' students in 112 (second semester of second year—the final required course), and the 'advanced' students in 205 (a conversation course). As for the tests, she first had the students make judgment statements about the grammaticality of a series of Spanish sentences. She then gave them a picture description task in Spanish, and finally engaged then in an oral interview. Her testing methods included a combination of communicative and discrete-point grammar based techniques. Scoring was based separately on the grammaticality of the responses and the communicative ability demonstrated.

One of the statistical tests that Feidy-Bashear ran on her data was an ANOVA. The results shown that there was no significant difference in the grammatical ability of the three groups tested. In fact the three groups seemed very similar. The entire sample was then ranked into three groups according to the test results, irrespective of their actual class placement of elementary, intermediate, or advanced. This original placement was referred to as the 'actual group,' whereas the test results placed the students into 'predicted groups.' Table 5 shows the results of Feidy-Bashear's groupings.

Table 5 Classification results from Feidy-Bashear (1988, p. 55) for grammatical test scores

Actual	Number	Predicted Group Membership
Group	Of Cases	1	2	3
Group 1 SP 102	24	(n = 13) 54.2%	(n = 4) 16.7%	(n = 7) 29.2%
Group 2 SP 112	22	(n = 5) 22.7%	(n = 12) 54.5%	(n = 5) 22.7%
Group 3 SP 205	12	(n = 3) 25.0%	(n = 4) 33.3%	(n = 5) 41.7%

Of 24 students in the actual elementary group (that is, the students from the 102 class), over 54% of them were in the predicted elementary group. This means that about 46% of them were predicted to belong to either the intermediate or advanced groups based on their grammatical performance. An even more dramatic distribution can be seen for the intermediate and advanced groups. In face, of all the students, on 51.72% of them were predicted to be in the group in which they actually were.

When Feidy-Bashear compared the three groups with their communicative ability ratings, she found that the students were performing more consistently at the level to which they actually belonged: the higher level students communicated more efficiently that the lower level students. However, the elementary students showed a greater amount of variation that the advanced ones. This is the opposite of what we might expect to see. If all students somehow begin their studies of a foreign language at the same level we would expect some to pull ahead and some to fall behind as they progress through several semesters of study. Therefore, the widest variation within a class should be expected at the upper levels. The fact that this is not the case indicates a weakness in the placement exam, or poor self-placement by students.

From the grammatical point of view, these data show that the current method of placement at the University of Delaware, based on a multiple-choice cloze format, is only about 50% accurate, a rate that might be as easily obtained by chance. Students who are true beginners in an elementary class should not necessarily have to learn in the same environment as students who are not beginners. It may be, however, that the placement exam is not testing grammatical ability. The exam was in fact not created to test grammatical knowledge exclusively. But as Feidy-Bashear's set of communicative data points out, students are not appropriately placed even using this parameter.

If a homogeneous grouping is for some reason not feasible, the instructor should at least be aware of the differences from the outset. The current situation may lead both students and instructors to believe that since a placement exam has been administered, all of the students are at the same level.

5. Possible placement procedures

There are not many practical methods that can be used as placement exams. As mentioned earlier, a multiple-choice test seems highly desirable because of the ease, speed, and efficiency of scoring. A solution to the problems posed by the current placement exam may be provided by the Rasch technique. This is a graded system that was originally devised by Rasch (1960), further developed by Stone (1980), Masters (1982), and Wright (1980, 1982), and specifically applied to language testing and cloze tests by Klein-Braley (1981, 1984) in which a series of C-tests, graded from easiest to most difficult, is given to the student. A series of multiple-choice cloze tests might be developed in which the students could be placed based on their passing any particular one of the shorter passages. If indeed the first cloze test included in such a placement exam were noticeably easier for the students, this might also help to reduce some of the affective problems that have been associated with the placement exam in the past. This type of procedure would also help to eliminate some of the face validity problems that are found in the current exam.

A second possible solution to the placement problem, as suggested in section 4.2.2., would be to assign students to appropriate classes based on the number of years of foreign language study. This type of solution requires more statistical study, but, if it proves to be a viable alternative, it would result in a great savings of money and manpower. Indeed, foreign language placement would become nearly automatic. Once a formula has been devised for assigning students, those students who inevitably insist that they didn't learn anything in high school, or that it has been too long since they last studied, may still be allowed to take a lower level course, but without the opportunity to receive credit.

6. Summary

I have shown that the multiple-choice cloze exam such as that currently used at the University of Delaware may not be an adequate predictor for level assignment. In fact, the number of years that a student has previously studied a foreign langauge seems to be a better predictor of class level. There are two alternative solutions, both of which deserve investigation: one is to improve the placement exam; the other is to use previous study as the sole input for placement.

The best path to follow for an improved placement exam seems to be to use the Rasch Technique developed by Klein-Braley. Such an exam might, because it is graded from easy to difficult, lower the students compulsion to intentionally do poorly. Also, the test might easily be made longer than the current fifty items. With a hundred items, for example, it will be easier to avoid the current placement exam's problem of confusing poor placement with random answers. A longer exam might also appear more important to students, and they might, therefore, make a greater effort. Students take on 20 or 30 minutes to complete the fifty-item exam discussed in this paper. Doubling the number of items, and the amount of time necessary to administer the exam, would not become an administrative problem.

7. References

Bachman, L. F. (1985). Performance on cloze tests with fixed-ratio and rational deletions. TESOL Quarterly, 19, 538-556.

Briere, E. J. (1972). Are we really measuring proficiency with our foreign language tests? In H. B. Allen & R. N. Campbell (Eds.), Teaching English as a second language: A book of readings.

Brown, J. D. (1981). Newly placed students versus continuing students: Comparing proficiency. In J. C. Fisher, M. A. Clarke & J. Schacter (Eds.), On TESOL '80. Washington DC: TESOL.

Brown, J. D. (1983). A closer look at the cloze: Validity and reliability. In J. W. Oller (Ed.), Issues in language testing research, 237-250. Rowley, MA: Newbury House.

Clark, J. L. D. (1972). Foreign language testing: Theory and practice. Philadelphia: The Center for Curriculum Development.

Farhady, H. (1979). Test bias in language placement examinations. In C. Yorio & J. Schacter (Eds.), On TESOL '79. Washington DC: TESOL.

Farhady, H. (1982). Measures of language proficiency from the learner's perspective. TESOL Quarterly, 16, 43-59.

Feidy-Bashear, L. (1988). Metalinguisitc awareness in classroom foreign-language learning: Implications for teaching grammar in an interactive method. Unpublished doctoral dissertation. University of Delaware.

Klein-Braley, C. (1981). Empirical investigations of cloze tests: An examination of the validity of cloze tests as tests of general language proficiency in English for German university students. Duisburg, docotoral dissertation.

Klein-Braley, C. & Raatz, U. (1984). A survey of research on the C-test. Language Testing, 1, 134-146.

Lado, R. (1961). Language testing: The construction and use of foreign language tests. New York: McGraw-Hill.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-74.

Oller, J. W. Jr. (1979). Language tests at school. Essex, England: Longman.

Porter, D. (1978). Cloze procedure and equivalence. Language Learning, 28, 333-340.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press.

Spurling, S. & Ilyin, D. (1985). The impact of learner variables on language test performance. TESOL Quarterly, 19, 283-301.

Wright, B. D. & Stone, D. (1980). Best test design: Rasch measurement. Chicago: MESA Press.

Wright, B. D. & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago: MESA Press.

8. Appendix: The database

Each of the following rows of data represents one student in a 101 Spanish class. Missing values are represented with a period (.). The variables listed here are as follows.

1:	CLASS-1: 101 section 13; 2: 101 section 10; 3: 101 section 11
2:	SEX-1: male; 2: female
3:	PRE-TEST: Placement exam taken before the semester began
4:	TEST 1: Placement exam given during the first week of the semester
5:	TEST 2: Placement exam given during the fifth week of the semester
6:	GRADE: Average of written and oral exams given during the fifth week of the semester
7:	Number of YEARS of Spanish studied
8:	Number of years AGO that the student last studied Spanish

ROW

CLASS

SEX

PRE-TEST

TEST 1

TEST 2

GRADE

YEARS

AGO

3.5

0.5

3.5