Why It Might Be Impossible to “AI-Proof” Written Assignments (and What We Can Do About It)

A significant amount of time, effort, and resources go into training large language models (LLMs) to follow instructions. In fact, after the initial pre-training step, many models are specifically instruction-tuned in order to make them better at following instructions. If you’ve ever been poking around Huggingface and wondered why some models have “Instruct” in their name (like Llama-3-8B vs Llama-3-8B-Instruct), this is why. 

While a wide range of prompt engineering frameworks exist, they all have one thing in common: they help you write clear, detailed, thorough, accurate instructions for an LLM to follow. LLMs can complete simple tasks given only simple instructions (“Write a poem about a sunny day”), but in order to complete more complicated tasks they need more detailed instructions (e.g., see this 820 word ‘Updated Tutoring Prompt’ by Ethan Mollick that instructs the LLM to act as a tutor). Because many models are specifically instruction-tuned as part of their training process, clearer instructions generally result in better outputs from the model.

What does this have to do with assessment? Leaving aside psychometric considerations like validity and reliability, the hallmark of a good assignment is that its directions are clear and unambiguous, giving students all the information they need to succeed on the assignment. For example, if you want students’ essays to be five paragraphs long, you’d better specify that in the assignment description. (Can you imagine how students would react if you took away points for only being four paragraphs long, when the assignment didn’t include instructions about the length of the essay?) If you want students to describe at least three causes of the Revolutionary War, you’d better specify that in the assignment description. If you want students to “select all the correct answers below,” you’d better make that clear. &c. In fact, each and every aspect of the assignment that could impact a student’s grade has to be clearly spelled out in the instructions in order for students to be able to do the work well (and for you to be able to grade the work fairly). And when your instructions are sufficiently detailed for students to do the work, the instructions are also sufficiently detailed for an LLM to do it.

Or, to be more concise:

  • LLMs are trained and optimized to excel at following instructions, and
  • Assignments are instructions; therefore
  • LLMs are pretty good at doing assignments.

Of course, large language models can’t do extra-linguistic assignments like “Mix two chemicals to create a specific reaction,” or “Run a mile in under 10 minutes,” or other tasks that have to be performed in the real world. Which is why in the title above I’ve limited this discussion to “written assignments.” But if an assignment can be completed by writing, then, in theory, an LLM should be able to complete the assignment. This is true for something as small as “write the letter indicating the correct answer option to this multiple choice question” to something as big as “design a research study to answer this question” to something completely different like “write a Python program that will monitor network traffic for suspicious activity.”

A more productive way of asking the question “How can I create an AI-proof written assignment?” is to ask “How can I create assignment instructions that are clear enough for each and every student to be able to understand and follow correctly, but that an LLM will not be able to understand and follow?”

Even though that’s a more useful way of asking the question, I’m afraid it doesn’t suggest many good strategies. The obvious strategy here is some kind of obfuscation – making the assignment instructions deliberately confusing. But it’s hard to imagine an obfuscation strategy that will confuse an LLM without confusing at least some of your students as well. That makes obfuscation an unacceptable strategy. 

At one time I thought an “omit key information” strategy could work, meaning you would create an assignment description that leaves out necessary information that the student has access to but the LLM does not. Something like, “Write a two page essay describing the influence of supply on demand. Describe this influence using the three specific examples we discussed in class.” On the surface, it seems like there’s no way for the LLM to complete this task. But it turns out that this kind of strategy is trivially easy to defeat – students can simply audio record the class lecture (which many of them already do) and upload the audio file to the LLM together with the assignment instructions. The LLM will pull the three key examples out of the audio and use them to complete the assignment. Trying a similar approach by referencing material in the textbook (“be sure to include in your response the five topics discussed in Chapter 5”) is defeated in a similarly easy way, by students simply taking pictures of each page in the textbook and uploading those to the LLM together with the assignment instructions. Again, the LLM will pull the necessary information out of the images and use it to complete the task. Generally speaking, if a student has access to the information it will be trivial for them to provide the LLM with access to the information, too. So an “omit key information” strategy doesn’t work.

Let’s pause to acknowledge that LLMs won’t always be able to complete assignments with 100% accuracy. But that’s small consolation for two reasons. First, many students looking to use LLMs for help with their homework will be super happy if the LLM can consistently score 80-90% on the assignments it completes on their behalf. For many students, B grade work is totally acceptable. And second, models are getting better all the time. Today’s 80-90% accuracy will soon become 85-95% accuracy, then 90-95%, etc.

So Now What?

If there’s no way we can design written assignments that will prevent students from having an LLM do the work for them, what are we to do? Our only option may be something that sounds, at first mention, even more impossible – helping students actually understand the relevance of the work we ask them to do. 

If students see the value in doing the work we ask them to do, they will do it gladly. But when an assignment has literally no meaning to a student beyond the score that is recorded in the gradebook, you might argue that the most rational behavior for them is to find the easiest path to maximizing that score (e.g., having an LLM to do the work). Perhaps they believe that college is just a big game they’re playing to get the grades they need to pass the classes they need to get the piece of paper they need to make it through the first round of screening for the job they want. If none of their instructors have ever tried to disabuse them of this idea and persuade them to actually care about a class topic, why would students spontaneously start caring? (Hint: They won’t.)

Instructors aren’t required to make the constant, painstaking, exhausting, impossible-feeling effort necessary to persuade students that their course subjects are relevant, meaningful, and important. They don’t have to because it’s much easier to simply wield the power of the grade – “You’ll do that assignment because I said you will. And I’ll fail you if you don’t.” As we said about students using LLMs above, this “wield the grade as a bludgeon” approach is probably the most rational thing for instructors to do – it allows them to fulfill their teaching obligations with the least amount of effort. Who has the time or energy to do extra work that isn’t compensated or rewarded?

But what a tragedy it is when everyone takes the “most rational” path, with students minimizing the effort they expend on learning and instructors minimizing the effort they expend on teaching! This path not only minimizes effort, but also minimizes the joy, inspiration, self-discovery, transformative possibilities, and ennobling power of education. 

As the old saying goes, “they won’t care how much you know until they know how much you care.” We have to show our students that we care about them as people, and about their academic success, and that we care about our disciplines, and that there are valuable, meaningful, relevant lessons for them to learn with us. 

I deeply, fundamentally believe that education can be joyful, inspiring, and ennobling. And when students see value in learning a subject and come to care about it, they won’t have LLMs do their work for them. They’ll see that shortcut as the missed opportunity that it is – they’ll understand that “cheating themselves” is a real thing. But many students don’t see that now, and they’ll only come to see it if we help them. Doing the hard work of establishing relevance and value and meaning is more important now than ever before, and it may be the only sustainable answer to the assessment issues facing education. 

Perhaps – just maybe – the questions generative AI is forcing us to ask about assessment (and other things) will result in positive changes in education.