The CHEAT Benchmark

For those interested in issues around agentic AI and assessment, I’m excited to announce the launch of the CHEAT Benchmark (https://cheatbenchmark.org/). The CHEAT Benchmark is an AI benchmark like SWE-Bench Pro or GPQA Diamond, except this benchmark measures an agentic AI’s willingness to help students cheat. By measuring and publicizing the degree of dishonesty of various models, the goal of this work is to encourage model providers to create safer, better aligned models with stronger guardrails in support of academic integrity. ...

January 29, 2026 · David Wiley

Democratizing Participation in AI in Education

tl;dr - Go play around with generativetextbooks.org and let me know what you think. Earlier this year I began prototyping an open source tool for learning with AI in order to explore ways generative AI and OER could intersect. I’m specifically interested in trying to combine the technical power of generative AI with the participatory power of OER, in order to both increase access to educational opportunity and improve outcomes for those students who access it. I did some preliminary writing on this topic back in July of 2023, calling the artifacts that result from combining generative AI and OER “generative textbooks” and have continued to ruminate on the topic. ...

August 19, 2025 · David Wiley

"AI Models Don't Understand, They Just Predict"

“Generative AI models don’t understand, they just predict the next token.” You’ve probably heard a dozen variations of this theme. I certainly have. But I recently heard a talk by Shuchao Bi that changed the way I think about the relationship between prediction and understanding. The entire talk is terrific, but the section that inspired this post is between 19:10 and 21:50. Saying a model can “just do prediction,” as if there were no relationship between understanding and prediction, is painting a woefully incomplete picture. Ask yourself: why do we expend all the time, effort, and resources we do on science? What is the primary benefit of, for example, understanding the relationship between force, mass, and acceleration? The primary benefit of understanding this relationship is being able to make accurate predictions about a huge range of events, from billiard balls colliding to planets crashing into each other. In fact, the relationship between understanding and prediction is so strong that the primary way we test people’s understanding of the relationship between force, mass, and acceleration is by asking them to make predictions. “A 100kg box is pushed to the right with a force of 500 N. What is its acceleration?” A student who understands the relationships will be able to predict the acceleration accurately; one who doesn’t, won’t. ...

July 9, 2025 · David Wiley

Writing is Thinking: The Paradox of Large Language Models

Last week I had the amazing opportunity to speak at the 3rd Annual AI Summit at UNC Charlotte. The entire event was wonderful and the organizing team were terrific. My keynote wasn’t recorded, so I thought I would serialize it across a series of blog posts. This post is the first in that series, and this section of the talk was titled Writing Is Thinking. David McCullough said, “Writing is thinking. To write well is to think clearly. That’s why it’s so hard… We all know the old expression, ‘I’ll work my thoughts out on paper.’ There’s something about the pen that focuses the brain in a way that nothing else does.” ...

May 20, 2025 · David Wiley

Gravity, Bandwidth, and Tokens: Fundamental Constraints on Design

Back in the mid-1990s I read an absolutely amazing article that had a lasting impact on my thinking. Despite looking for it several times over the years, I’ve never been able to find it again. This was the era of 14.4k, 28.8k, and 56k modems, when we used our home landlines to dial up and connect to the internet. The article’s main argument was that, just as architects have to understand and account for gravity in their designs of bridges and buildings, web architects have to understand and account for bandwidth in their website designs. Back in the day, including too many large images on a webpage could “weigh it down” to the point of “collapse.” Your 28.8k connection provided so little bandwidth to your home that you simply couldn’t download that much data in a reasonable amount of time, so after waiting a minute for the page to load you just gave up and went somewhere else. ...

April 17, 2025 · David Wiley

OELMs Github Updated and Demo Video

The OELMs source code has been updated on GitHub to include better documentation to help you get started as well more examples of content and activities. As you may remember from last week’s post about the OELMs architecture, the design goal of Open Educational Language Models is to combine the technical power of generative AI with the participatory power of open education. To help you see how that works, the initial implementation in GitHub is sub-optimized in order to make it easier to understand how to contribute. As you see in the demo content in the screenshot from GitHub below, each “course” in an OELM is comprised of three parts (as described last week): ...

February 10, 2025 · David Wiley

The OELMs Architecture: The Technical Power of Generative AI Meets the Participatory Power of OER

Or, in which Generative AI meets OER meets Reusable Learning Objects. I’ve been working on fleshing out the architecture for Open Educational Language Models and have reached a point where it’s time to share a progress update. I’ve discussed the idea with several people and gotten some really excellent feedback, and building prototypes has helped me further refine my thinking. Lessons from the Past: Separating Content from Presentation I created my first website in the early 1990s, back when all we had was HTML. There was no CSS, no Javascript. Actually, there weren’t even images in those first webpages. I was just surfing the web with Lynx, hitting the / key to read the source code of other people’s sites, and learning how to build my own. The introduction of CSS - and the idea of creating a clean separation between content and presentation - was a revelation that totally changed my thinking and the way I designed website. (In fact, my first print publication was a series of chapters about CSS in a book on “Dynamic HTML” in the late 90s.) ...

January 27, 2025 · David Wiley

Where Open Education Meets Generative AI: OELMs

Prelude The extraordinary woman who mentored me through graduate school and co-chaired my PhD committee, Dr. Laurie Nelson, frequently talked to me about the idea of “current best thinking.” Characterizing something as your “current best thinking” gives you permission to share where you are in your work while simultaneously making it clear that your thinking will still evolve in the future. It is critically important to remember that both open education and generative AI are tools and approaches - they’re means to an end, methods for accomplishing a goal or solving a problem. I’m interested in solving problems of access and effectiveness in education. I think open education and generative AI have a lot to offer toward solutions to these problems. But I want to, from the outset, caution all of us (myself included) against becoming enamored with either open education or generative AI in and of themselves. As they say, you should fall in love with your problem_, not your_ solution_._ ...

December 13, 2024 · David Wiley

Why It Might Be Impossible to “AI-Proof” Written Assignments (and What We Can Do About It)

A significant amount of time, effort, and resources go into training large language models (LLMs) to follow instructions. In fact, after the initial pre-training step, many models are specifically instruction-tuned in order to make them better at following instructions. If you’ve ever been poking around Huggingface and wondered why some models have “Instruct” in their name (like Llama-3-8B vs Llama-3-8B-Instruct), this is why. While a wide range of prompt engineering frameworks exist, they all have one thing in common: they help you write clear, detailed, thorough, accurate instructions for an LLM to follow. LLMs can complete simple tasks given only simple instructions (“Write a poem about a sunny day”), but in order to complete more complicated tasks they need more detailed instructions (e.g., see this 820 word ‘Updated Tutoring Prompt’ by Ethan Mollick that instructs the LLM to act as a tutor). Because many models are specifically instruction-tuned as part of their training process, clearer instructions generally result in better outputs from the model. ...

July 1, 2024 · David Wiley

The Musician's Rule and GenAI in Education

Over four years ago I described what I called the Musician’s Rule. The key insight behind the Musician’s Rule can be grasped by reflecting on two short scenarios. Imagine what would happen if: a person with no musical training is given a $1M Stradivarius violin and asked to play it. a person with a graduate degree in violin performance and decades of experience playing in recitals and concerts is given a $30 middle school orchestra rental violin and asked to play it. Which music will be the most enjoyable? Take a minute and really try to imagine what each of those mini-concerts would sound like. One hundred times out of one hundred, the music made by the person with training and experience sounds the best. ...

June 17, 2024 · David Wiley