Apr 9, 2014

Revising the SAT To Make It Even Worse

by Jim Loewen

tags: SAT

Happily, the Educational Testing Service (ETS) — the folks who bring us the SAT — have heard the increasing protests against their product. To be sure, they never gave much heed to FairTest, the nonprofit in Cambridge, MA, that for two decades has mounted devastating critiques of their products. Nor did they pay attention to faultfinding writers like Nicholas Lemann, Howard Gardner, or James Crouse and Dale Trusheim. But they do pay attention to that #1 critic of all: the market.

More and more colleges and universities have been making the SAT and its rival, the ACT, optional. Even worse from the viewpoint of ETS, the SAT has been losing ground to the ACT, its competitor that already dominated college admission in the Midwest and South. In Michigan, for example, according to the Washington Post, the number of people taking the SAT dropped by more than half from 2006 to 2013. In Illinois, the drop was 46%. The SAT has long been #1 on the West Coast, but its dominance is now in danger of being lost. Even in such Northeastern states as Pennsylvania, New Hampshire, and Vermont, fewer students in absolute terms now take the SAT than did in 2006. The Northeast is ETS's home ground.

In response, Educational Testing Service — which is neither educational nor a service — has announced a major revision in its marquee product, the SAT.

Some of ETS's changes will make the SAT even worse, not better. Removing the "penalty" for wrong (as opposed to blank) answers is one. Previously, ETS took off 1/4 point for wrong answers on the SAT. No longer will they do so.

Why is this a change for the worse?

An example will illustrate. Imagine a student — let's call him "Ernie" — who does not even read the test. He just answers randomly. Maybe he simply shades "B" "B" "B," all the way down the answer sheet. Each item has five alternatives, A, B, C, D, and E, so Ernie would get about 20 of 100 items correct, on average. He would also get about 80 wrong. Using the old SAT formula, his score would be his number correct, 20, minus 1/4 times his number wrong, 80. Since (1/4) x 80 = 20, Ernie's total score would be 20 - 20 = 0.

That raw score has real meaning. Getting a zero is appropriate. Ernie has shown no knowledge. He did not even read the test. His score should be zero. He got no penalty for guessing; his guessing simply was not rewarded. That's why I put quotation marks around "penalty" in my first use of it. ETS then translates this zero into 200 on its arcane 200-to-800 scale.

Imagine a student who did have some knowledge and ability. Call her Suzie. Suzie can read, perhaps slowly, and she can think, perhaps deeply. She answers only 15 of the 100 questions on this imaginary SAT, but because she reads carefully and thinks deeply, she gets all 15 correct. Under the rules up to now, it makes no difference whether she guesses on all the rest, some of the rest, or none of the rest. On average, her raw score will be 15. Not a good raw score, but far better than the zero that random guessing provided Ernie.

Beginning in 2016, copying the ACT, ETS will no longer take off a fraction for wrong (as opposed to blank) answers on the SAT. As always, students will answer all the items they are sure of that they get to. The tests are timed, of course. Students who read more slowly, are less familiar with the format, or are simply less verbally glib (including on the math test) will not reach every item.

In the future, test-wise students will use their final 30 seconds to answer every item, perhaps shading "B" "B" "B" all the way down, like Ernie, never even reading the rest of the items. Now ETS will reward guessing, so their scores will improve. Ernie, for example, will get 20, beating out Suzie's original score of 15, unless Suzie resorts to a similar strategy. Yet Suzie knew something. Of those items she answered, she got every one right. Ernie has not even demonstrated that he can read, only the ability to blacken the little ovals under column "B."

If you run out of time, beforeo you put your pencil down, just circle “B,” “B,” “B,” all the way down the page.

Clearly the new policy is anti-intellectual. It tells students that right answers are important, even if achieved by gaming the system. It implies that the score is what counts, rather than the knowledge or thinking that it represents.

As well, the new policy turns out to be biased. It is anti-black, anti-female, anti-rural, and anti-poor. Indeed, it hurts everyone who does not match up well with the socioeconomic status of white male residents of Princeton, NJ, where ETS staffers live — whom we might call the "in-group."

That's because out-group members will not get taught to guess randomly — at least not to the degree that suburban white kids will. Most poor people, racial minorities, rural people, etc., do not take the Princeton Review, the coaching school that for decades has helped children of the Establishment "game" the SAT. Princeton Review alumni will guess "B" "B" "B." Others, not so much.

To be sure, the instructions will inform students that nothing is deducted for wrong answers. But they will not be convincing. Perhaps they will say, "Informed guesses can help your score. If you think you can rule out an alternative, you are advised to choose among the remaining answers even if you are not sure which is correct." But they will not say, "You are an idiot if you do not fill in something for every item." Such a statement would come across as too anti-intellectual, even though it is accurate. Nor will ETS suggest, "Simply fill in 'B' 'B' 'B' all the way down." Princeton Review will.

We can infer that ETS will do a bad job of telling students when to guess because in the past they have done a bad job of telling students when to guess. With the old (and still current) scoring system, it is mathematically certain that students should guess randomly whenever they can eliminate one or more alternative as definitely wrong. If Ernie eliminates one wrong answer on each of 100 questions, then guesses randomly among the four alternatives that remain, he will get, on average, about 25 items correct (1/4 of 100). He will also get 75 wrong. At present, ETS will subtract ¼ of 75 or 18,75 for these wrong (as opposed to blank) replies. Ernie’s raw score will be 25 – 18.75 = 6.25, significantly better than the 0 he “earned” if he did not even read the items. If Ernie could eliminate two choices, he will get about 33.33 items correct (guessing randomly among the remaining three alternatives). ETS will subtract ¼ of 66.67 or 16.67, leaving Ernie with a raw score of about 16.67, much better than 0.

So what has ETS been telling students to do, regarding this type of intelligent guessing? Here is an example, from Real SATs, a 400-page ETS publication intended to advise high school students how to take the SAT:

As a last resort, if you can eliminate any choices as definitely wrong, it may pay you to make a guess among the other choices.

Test preparation material from ETS and The College Board do not level with students like material and courses from Princeton Review. Therefore unequal access to coaching is one more barrier that confronts “outgroup” students like poor people, minorities, and rural residents.

This advice is so weak as to be misleading. Intelligent guessing should not be considered a “last resort.” It will – not “may” – “pay you to make a guess among the other choices.”

Why does ETS so downplay guessing? Well, ETS claims “the SAT I is designed to help predict your freshman grades, so that admissions officers can make better decisions…,” according to page 4 of Real SATs. To do this, the SAT suposedly tests “your reasoning,” “how well you will do in college,” “your abilities,” “your own academic development,” to quote from other early pages of the book. To emphasize that students should guess randomly when they can eliminate an alternative or two is not seemly. It’s a gaming tactic. It does not belong in the same conversation with these other important “abilities.” Surely that’s why ETS has been doing a bad job of telling students when to guess.

The new policy – rewarding guessing – is even less defensible intellectually, as we have seen. Hence ETS will surely be even less forthright about how to “game” it. Many high school counselors and college admissions staff will only compound the problem. Here is an example. Discussing the change, the dean of admissions at St. Lawrence University said, "It will encourage students to consider the questions more carefully and to attempt them, where before if a cursory glance at a question made it seem too complex to them, they may go ahead and skip that question." So he would suggest that students “consider the questions carefully” that they don’t get to. Of course, they don’t have time to do that! What they should do is not consider them at all, just blindly fill in answers. So he, like many others in the college admissions process, will be a source of misinformation to students seeking guidance on whether and how to guess.

Several strands of evidence suggest that many test-takers simply do not guess blindly, even though they should. For example, years ago, when preparing to testify in the important civil rights case Ayers v. Fordyce (see Wikipedia for a short summary), I came upon ACT scores for students across the state of Mississippi. SAT and ACT scores both correlate strongly with social class and race. Since Mississippi is at once the poorest and blackest state in the U.S., I was not surprised to learn that many students scored abysmally on the ACT.

I was surprised to learn that in Mississippi in the 1980s, about 7% of white students and 13% of black students scored below random on the ACT.

Ordinary lack of ability cannot account for scores below random. As we saw with Ernie, if one does not read — perhaps cannot read — one still scores randomly, so long as one can shade the little ovals on the answer sheet. To score worse than random is truly a bizzare accomplishment.

There is only one likely way to score below random:¹ probably the students didn't finish the ACT and didn't guess.

Suzie might be an example. If she worked doggedly, read slowly, thought deeply, answered every item she reached, and never used the last 30 seconds to guess, her score would wind up below random.

The guessing issue does not only affect poor test takers, whose scores wind up below random. Many students with scores above random also don't finish and don't guess. More than a dozen years ago, ETS changed how it scored its Graduate Record Exam (GRE), removing the subtraction for wrong (as opposed to blank) answers. Immediately, ETS observed that many test takers were still leaving many items blank, thus artifically lowering their scores. So far as I can tell, ETS then did nothing about this problem.² In the material on the GRE on line as of March, 2014, nowhere can I find advice to guess. The closest ETS comes is to tell would-be test-takers,

"For each of the two measures, a raw score is computed. The raw score is the number of questions you answered correctly."

That's not very close! The word "guess" appears nowhere in this section — indeed, nowhere on the entire website — not at "About the Test," "Scores," "How the Test Is Scored," nor even "Frequently Asked Questions." Yet anyone who has ever talked with a roomful of test takers knows that "Should we guess?" is perhaps their most frequently asked question.

Not only are minority students, poor people, and rural people less likely to get the word that they should guess randomly. Research shows they are also less likely to believe it. I have seen this myself, when trying to clue in African American students in Mississippi on how to "game" the GRE. Perhaps it's a matter of sophistication — whatever that is — or the narrower concept, test-wiseness. I concluded that students not in the "in-group" are more dutiful, more sincere in a way, perhaps more plodding. They are less likely to believe that one should do such a thing as answer "B" "B" "B" all the way down an answer sheet. Somehow it doesn't seem right to them.

I share their feeling. Blindly answering "B" "B" "B" all the way down an answer sheet isn't right. It's not an activity that should be rewarded. It shouldn't have anything to do with getting into college. It is anti-intellectual.

As well, "out-group" students are less credulous, less likely to believe what they're told. Sometimes this is good. It made them less likely to believe in the Vietnam War, for example.³ When taking standardized tests, however, it hurts them. Girls, too, are less likely than boys to follow advice to "game the system," which accounts for part of the gap between male and female scores on the GRE and ACT.

As a result of these differences in intellectual style, "out-group" students and girls will be even more disadvantaged by the new SAT than they are now by the old one. I have written elsewhere about how and why the SAT disadvantages African Americans and girls (see Eileen Rudert, ed., The Validity Of Testing In Education And Employment [DC: US Commission on Civil Rights, 1993], 41-45, 58-62, 73-91, 161; and "Gender Bias on SAT Items," with Phyllis Rosser and John Katzman, Amer. Educ. Research Assn., 4/1988, ERIC ED294915). In brief, the statistical tests to which ETS submits proposed new items guarantee that no questions that favor blacks over whites can ever appear on the final SAT. Neither can an item on the math test ever favor girls over boys. As a result, African Americans do badly enough already! Rural people, compared to students living in the advantaged suburbs of the world, do badly enough already. So do girls, on the math test. To add yet another source of disadvantage by this rule change seems gratuitous, sort of "piling on."

If the SAT did its job well, that might be another matter. Its job is, of course, to predict first-semester college grades. At most colleges, the SAT adds almost nothing to the prediction obtained simply from high school grade point average alone.⁴

ETS has known for decades that the SAT does not measure "scholastic aptitude." Some years ago, "SAT" stood for "Scholastic Aptitude Test." No more. In 1993 the U.S. Civil Rights Commission published the testimony of Nancy Cole, then vice-president of ETS, admitting that the SAT does not measure "aptitude" for college (see Eileen Rudert, ed., The Validity of Testing in Education and Employment, 59 for Cole) The next year, Cole having become president, ETS changed its name to the "Scholastic Assessment Test." A few years later, painfully aware that "Assessment Test" was redundant, ETS renamed the SAT once more. Now it merely stands for "S.A.T." — the initials mean nothing at all!

The name change also amounts to nothing at all, however, because most people don't know it occurred. Even in 2014, when asked what "SAT" stands for, college audiences across the U.S. chorus "scholastic aptitude test." Indeed, ETS has done little to popularize the change. Quite the opposite: owing to the invisible use of "scholastic aptitude test" all over ETS's home page, Google sends searches for the term to that site, even though neither "scholastic" nor "aptitude" appear visibly on that page.

Establishment parents hire college placement tutors to tell their children that the SAT doesn't measure aptitude, so they should still apply to college even after getting poor scores. Again, then, African Americans, rural students, children from poor families, etc., remain particularly vulnerable — more likely to infer that their low test scores mean they have low aptitude. That's too bad, because some of them — perhaps Suzie, for one — would do fine in college. After January 2015, when ETS rewards guessing, this problem will grow even worse. After 2015, some students will score below random on the SAT, as they do on the ACT. Then they can infer that they are really stupid, even though part of the reason they tested so poorly is that they didn't shade "B" "B" "B" after they answered all they had time for.

Sigh!

1 To be sure, one might have such a perverse way of thinking that one systematically chooses wrong alternatives. Since SAT and ACT items usually do not focus on religious or political opinions but on word usage and math, however, scoring below random owing to perversity seems unlikely.

2 This section rests partly on work by Jeri Grandy, personal communication, 9/2000.

3 See Lies My Teacher Told Me (NY: Simon & Schuster, 2007), 345-54.

4 Neither does the ACT. In MS, for example, adding ACT scores increased the correlation between HS GPA and first semester GPA at Alcorn U. from .55 to .57, a trivial increase. At MS State U., the increase was from .68 to 71.