Artificial-Intelligence

The CHEAT Benchmark

For those interested in issues around agentic AI and assessment, I’m excited to announce the launch of the CHEAT Benchmark (https://cheatbenchmark.org/). The CHEAT Benchmark is an AI benchmark like SWE-Bench Pro or GPQA Diamond, except this benchmark measures an agentic AI’s willingness to help students cheat. By measuring and publicizing the degree of dishonesty of various models, the goal of this work is to encourage model providers to create safer, better aligned models with stronger guardrails in support of academic integrity. ...

Connecting Prompt Writing to Other Genres of Writing

Rather than imagining “prompt engineering” as a new form of writing that appeared *ex nihilo* three years ago, I find it helpful to think about the ways this new kind of writing remixes existing forms of writing. For example, the primary goal of prompt engineering is getting a model to behave in a specific way. We do that by providing it with very clear, unambiguous instructions. There’s a clear connection to technical writing here. Some prompt engineering frameworks claim that adding phrases like “my job depends on it!” to a prompt can improve the quality of responses, so there’s likely an opportunity to draw in aspects of persuasive writing as well. &c. And of course there are the interesting differences between prompt writing and technical or persuasive writing, such as the difference in audience (when you write a prompt, your audience is an LLM). But it’s still the case that knowing something about your audience and how they think (in this case, knowing something about how LLMs work under the hood) can make you a more effective writer. ...

Democratizing Participation in AI in Education

tl;dr - Go play around with generativetextbooks.org and let me know what you think. Earlier this year I began prototyping an open source tool for learning with AI in order to explore ways generative AI and OER could intersect. I’m specifically interested in trying to combine the technical power of generative AI with the participatory power of OER, in order to both increase access to educational opportunity and improve outcomes for those students who access it. I did some preliminary writing on this topic back in July of 2023, calling the artifacts that result from combining generative AI and OER “generative textbooks” and have continued to ruminate on the topic. ...

"AI Models Don't Understand, They Just Predict"

“Generative AI models don’t understand, they just predict the next token.” You’ve probably heard a dozen variations of this theme. I certainly have. But I recently heard a talk by Shuchao Bi that changed the way I think about the relationship between prediction and understanding. The entire talk is terrific, but the section that inspired this post is between 19:10 and 21:50. Saying a model can “just do prediction,” as if there were no relationship between understanding and prediction, is painting a woefully incomplete picture. Ask yourself: why do we expend all the time, effort, and resources we do on science? What is the primary benefit of, for example, understanding the relationship between force, mass, and acceleration? The primary benefit of understanding this relationship is being able to make accurate predictions about a huge range of events, from billiard balls colliding to planets crashing into each other. In fact, the relationship between understanding and prediction is so strong that the primary way we test people’s understanding of the relationship between force, mass, and acceleration is by asking them to make predictions. “A 100kg box is pushed to the right with a force of 500 N. What is its acceleration?” A student who understands the relationships will be able to predict the acceleration accurately; one who doesn’t, won’t. ...

Writing is Thinking: The Paradox of Large Language Models

Last week I had the amazing opportunity to speak at the 3rd Annual AI Summit at UNC Charlotte. The entire event was wonderful and the organizing team were terrific. My keynote wasn’t recorded, so I thought I would serialize it across a series of blog posts. This post is the first in that series, and this section of the talk was titled Writing Is Thinking. David McCullough said, “Writing is thinking. To write well is to think clearly. That’s why it’s so hard… We all know the old expression, ‘I’ll work my thoughts out on paper.’ There’s something about the pen that focuses the brain in a way that nothing else does.” ...

Gravity, Bandwidth, and Tokens: Fundamental Constraints on Design

Back in the mid-1990s I read an absolutely amazing article that had a lasting impact on my thinking. Despite looking for it several times over the years, I’ve never been able to find it again. This was the era of 14.4k, 28.8k, and 56k modems, when we used our home landlines to dial up and connect to the internet. The article’s main argument was that, just as architects have to understand and account for gravity in their designs of bridges and buildings, web architects have to understand and account for bandwidth in their website designs. Back in the day, including too many large images on a webpage could “weigh it down” to the point of “collapse.” Your 28.8k connection provided so little bandwidth to your home that you simply couldn’t download that much data in a reasonable amount of time, so after waiting a minute for the page to load you just gave up and went somewhere else. ...

Making AI a More Effective Teacher: Lessons from TPACK

Human Teachers and AI Teachers Would you be surprised if you pulled a random person off the street, shoved them into a classroom full of students, and then found that they weren’t a particularly effective teacher? Of course not. And why wouldn’t that be surprising? Because effective teaching requires a great deal of knowledge and skill, and the person you pulled off the street most likely had no relevant training. ...

OELMs Github Updated and Demo Video

The OELMs source code has been updated on GitHub to include better documentation to help you get started as well more examples of content and activities. As you may remember from last week’s post about the OELMs architecture, the design goal of Open Educational Language Models is to combine the technical power of generative AI with the participatory power of open education. To help you see how that works, the initial implementation in GitHub is sub-optimized in order to make it easier to understand how to contribute. As you see in the demo content in the screenshot from GitHub below, each “course” in an OELM is comprised of three parts (as described last week): ...

The OELMs Architecture: The Technical Power of Generative AI Meets the Participatory Power of OER

Or, in which Generative AI meets OER meets Reusable Learning Objects. I’ve been working on fleshing out the architecture for Open Educational Language Models and have reached a point where it’s time to share a progress update. I’ve discussed the idea with several people and gotten some really excellent feedback, and building prototypes has helped me further refine my thinking. Lessons from the Past: Separating Content from Presentation I created my first website in the early 1990s, back when all we had was HTML. There was no CSS, no Javascript. Actually, there weren’t even images in those first webpages. I was just surfing the web with Lynx, hitting the / key to read the source code of other people’s sites, and learning how to build my own. The introduction of CSS - and the idea of creating a clean separation between content and presentation - was a revelation that totally changed my thinking and the way I designed website. (In fact, my first print publication was a series of chapters about CSS in a book on “Dynamic HTML” in the late 90s.) ...

Generative AI and the Assessment Equivalent of Bloom's 2 Sigma Problem

Since the advent of ChatGPT it seems like everyone is talking about Bloom’s 2 sigma problem. The quick version is this: the average student who is taught using a combination of (1) one-on-one (or small group) tutoring and (2) mastery learning performs about two standard deviations better then the average student taught in a typical classroom setting. The “problem” in Bloom’s 2 sigma problem is that, while we know that this dramatic improvement in student learning is possible, we don’t know how to implement it at scale. We can barely afford one instructor for 30 students - there’s no way we can afford full-time individual tutors for each student. So, since Bloom and colleagues published this finding in the 1980s, many people have been working on this challenge included in their article: ...