Generative AI and the Assessment Equivalent of Bloom’s 2 Sigma Problem

Since the advent of ChatGPT it seems like everyone is talking about Bloom’s 2 sigma problem. The quick version is this: the average student who is taught using a combination of (1) one-on-one (or small group) tutoring and (2) mastery learning performs about two standard deviations better then the average student taught in a typical classroom setting. The “problem” in Bloom’s 2 sigma problem is that, while we know that this dramatic improvement in student learning is possible, we don’t know how to implement it at scale. We can barely afford one instructor for 30 students – there’s no way we can afford full-time individual tutors for each student. So, since Bloom and colleagues published this finding in the 1980s, many people have been working on this challenge included in their article:

“If the research on the 2 sigma problem yields practical methods – which the average teacher or school faculty can learn in a brief period of time and use with little more cost or time than conventional instruction – it would be an educational contribution of the greatest magnitude.”

Generative AI has now revealed a similar problem – not with improving student learning, but with preventing student cheating. I’ll call it the “AI-immune assessment problem.” The quick version is this: we know there are a variety of ways to assess student learning that are 100% immune to cheating with AI – like some kinds of performance assessments, for example. The “problem” is that, while we know that AI-immune assessment is possible, we don’t know how to implement it at scale. We’ve been studying performance assessment for even longer than we’ve known about the two sigma problem, but we typically don’t use performance assessments because they’re so time-consuming and expensive (sound familiar)?

Bloom’s two sigma problem has always been aspirational – “how can we help more students achieve their potential?” But generative AI has made the assessment problem existential – “how can we certify that a person has learned when it’s possible to succeed on assessments without having learned?” To borrow style and structure from Bloom, it looks like the future will be one in which many people work on this challenge:

“If the research on AI-immune assessment yields practical methods – which the average teacher or school faculty can learn in a brief period of time and use with little more cost or time than conventional assessment – it would be an educational contribution of the greatest magnitude.”

Like many others, I’ve often found that I can make progress on difficult problems by finding new ways of framing them. Hopefully this new framing of the “cheating with AI” problem provides some new perspective that can help us make progress.