Footnotes on Epicycles

The glow of faux precision [or] Caveon, cave off

Tue 28 Dec 2010 09:59 AM

The NY Times has a story about Caveon, a firm that uses forensic methods to identify students who are cheating on standardized tests. There are reasons to be dubious both about standardized testing and about automated cheater detection. Cases can be made for both, but the deep problem is that they inevitably have an aura of faux precision. Two examples:

1. Cheating is on the rise all over the place, the story says. According to a state functionary, since Caveon "began working for Mississippi in 2006, cheating has declined about 70 percent." The problem, of course, is that there is no independent measure for how much cheating there is. If there were, then that independent measure could be used to identify cheaters without paying Caveon.

2. The criteria used to identify cheaters all seem sensible enough: too many correlated errors, too much variance in performance from section to section, lots of erasing (which allegedly is a sign that someone later cleaned up the answers), doing well on hard questions but poorly on easy ones (although this might occur because the student is overconfident or bored with easy questions), and so on. Even though some of them are only suggestive, such factors can combine to make a convincing case for cheating.

Caveon's actual algorithm is proprietary, but the article says that it calculates the probability that the particular array of factors might occur by chance. We are told, "When the anomalies are highly unlikely - their random occurrence, for example, is greater than one in one million - Caveon flags the tests for further investigation by school administrators." The deep problem here is that there is no natural probability model for the non-cheating test taker, but the precision of 1-in-a-million only makes sense given some defined probability model. For example, students of some backgrounds might have a hard time with some so-called 'easy' questions or a easy time with some 'hard' ones. The probability that they would do something that looks like an anomaly would be pretty high; higher, anyway, than the probability that would result from rolling dice to fill in bubble sheets.

I sympathize with Walter M. Haney, who is quoted in the article complaining that Caveon's methods haven't been published and so aren't open to scrutiny. As he says, "You just don't know the accuracy of the methods and the extent they may yield false positives or false negatives." The CEO of Caveon replies that "the company had not published its methods because it was too busy serving clients. But the company's chief statistician is available to explain Caveon's algorithms to any client who is curious." This doesn't seem like enough. The people who are qualified to evaluate the reliability of Caveon's algorithm's are experts in statistics and education testing, not the clients of Caveon who are administering tests.

Of course, people generally tend to be dazzled by mock precision. So part of the blame might go to the Times' reporting rather than to Caveon. Yet Caveon profits from this general glow that surrounds quantitative measures, and the persuasive power of secret algorithms makes Haney's criticism all the stronger.