The CHEAT Benchmark

For those interested in issues around agentic AI and assessment, I’m excited to announce the launch of the CHEAT Benchmark (https://cheatbenchmark.org/). The CHEAT Benchmark is an AI benchmark like SWE-Bench Pro or GPQA Diamond, except this benchmark measures an agentic AI’s willingness to help students cheat. By measuring and publicizing the degree of dishonesty of … Read more

Democratizing Participation in AI in Education

tl;dr – Go play around with generativetextbooks.org and let me know what you think. Earlier this year I began prototyping an open source tool for learning with AI in order to explore ways generative AI and OER could intersect. I’m specifically interested in trying to combine the technical power of generative AI with the participatory … Read more

“AI Models Don’t Understand, They Just Predict”

“Generative AI models don’t understand, they just predict the next token.” You’ve probably heard a dozen variations of this theme. I certainly have. But I recently heard a talk by Shuchao Bi that changed the way I think about the relationship between prediction and understanding. The entire talk is terrific, but the section that inspired … Read more