How Generative AI Affects Open Educational Resources

This is the middle section of my September 19, 2024 presentation, Why Open Education Will Become Generative AI Education. I’m pre-posting some of the presentation content due to the very active conversation the announcement of the presentation has created. Next week I hope to post the first section of the presentation, which outlines the reasons why people who care deeply about affordability, access, and improving outcomes should consider shifting their focus away from OER (as we have understood it for the last 25+ years) and toward generative AI. Or, using the language I introduce below, from “traditional OER” to “generative OER.” 

Like the internet before it, generative AI is radically transforming many aspects of society. Generative AI is already having a profound effect on the ways OER are authored, revised, and remixed. And significantly more dramatic impacts are possible if we will reach for them.

Traditional OER

We’re all familiar with what I’ll call “traditional OER”. Traditional OER include openly licensed textbooks, chapters, syllabi, assessments, images, videos, simulations, interactives, etc. Throughout this presentation, when I say “OER” I’m using the term as defined by Creative Commons, The William and Flora Hewlett Foundation, and many others. That definition is:

“Open Educational Resources (OER) are teaching, learning, and research materials that reside in the public domain or have been released under an open license that permits their free use and re-purposing by others.”

Generative AI profoundly affects the ways that traditional OER are (1) authored and (2) revised and remixed.

Authoring Traditional OER

Prior to the release of ChatGPT, traditional OER were entirely “hand-crafted,” meaning no generative AI was used in their creation. Since the release of ChatGPT, many traditional OER have been “AI-drafted,” meaning people have used generative AI to create the first drafts of their OER. (And then, hopefully, subjected these drafts to rigorous review and correction as necessary.)

In the hand-crafted approach to OER authoring, creating a first draft can take days, weeks, months, or even years depending on the scope and ambition of the project. In the AI-drafted approach to OER authoring, creating a first draft can take minutes, hours, days, or weeks depending on the scope and ambition of the project. Generative AI reduces the time and effort required to get to a first draft by at least an order of magnitude (divide the previous amount of authoring time by 10) and sometimes even two orders of magnitude (divide the previous amount of authoring time by 100) for repetitive tasks, like drafting a large bank of assessment items. Quality assurance processes like peer review and technical editing are still critically important, but the process of getting to a complete first draft often accounts for the majority of the time spent authoring OER.

Because time has always been one of the biggest barriers to the creation of OER, it isn’t hard to imagine a future in which the overwhelming majority of new OER are AI-drafted instead of entirely hand-crafted. And it seems likely that philanthropy, which seeks to maximize the amount of good it can do in the world per dollar spent, will strongly encourage (if not require) grantees to take an AI-drafted approach to creating new OER in the future.

Revising and Remixing Traditional OER

But the effects of the AI-drafted approach on revising and remixing OER are perhaps even more impactful than the effects on authoring. While open licenses make it legal to revise and remix OER, that permission doesn’t magically grant teachers, learners, or other users the time and expertise necessary to actually do the revising and remixing.

For example, creating a simplified version of an existing text for English as a second language learners is a common example of revising OER. When done manually, getting to the first draft can be an hours-long task. When done using generative AI, getting to the first draft is a minutes-long task. And research has demonstrated that instructors’ levels of engagement in revise and remix behaviors are directly correlated with the amount of time and effort necessary to complete revise or remix tasks (see Hilton et al., 2012). If the amount of time and effort necessary to revise and remix are inversely correlated with engagement in revise and remix activities, and AI-drafted approaches dramatically reduce the amount of time and effort necessary to revise and remix OER, then we should see a significant increase in the revising and remixing of OER in the future.

And a lack of time isn’t the only obstacle to revising and remixing OER. Most teachers, learners, and other users have either no expertise or very limited expertise in the skills necessary to perform in-demand forms of revise and remix, like translating an OER from one language into another. Research shows that the productivity gains of generative AI are highest among lower-skilled workers (Brynjolfsson, 2023; Dell’Acqua, 2023). Applying this finding to our example of translating a text into another language, generative AI will be a lot more helpful to a person who doesn’t speak one of the languages involved than it will be to a person who speaks both languages. Consequently, AI-drafted approaches unlock revise and remix activities that were previously impossible or impractical. This should both increase the kinds of revising and remixing happening in the future, as well as improve its quality.

Thanks to generative AI, we may finally be entering the long-awaired golden age of revise and remix.

Generative OER

Generative AI makes possible a new kind of OER that I’ll call “generative OER”. These are OER whose purpose is not to be studied directly by learners or used directly by teachers (like traditional OER are). Generative OER are OER whose purpose is to help learners, teachers, and other users create other OER. Generative OER include openly licensed prompts and openly licensed model weights.

Open Prompts

Many of the prompts written by first-time users of generative AI are relatively simple. They might be comprised of a short phrase or a few sentences. These basic prompts are unlikely to be eligible for copyright protection.

However, prompts eliciting more sophisticated behavior from a generative AI model can be hundreds or thousands of words long. These more creative prompts are likely subject to the same automatic copyrighting as other creative works. This means that for teachers, learners, and other users to be able to engage in the 5R activities with these far more pedagogically powerful prompts, the prompts will have to be openly licensed.

Users need to be able to revise and remix prompts for two reasons. First, the open education movement has recognized for decades how important it is for users to be able to localize OER to fit their local language, culture, and circumstances. Prompts are no different in this regard. Users must be able to revise and remix prompts so their results are helpful and appropriate in a specific user’s language, culture, and circumstances.

Second, users need to be able to localize prompts so they will “work” in the context of different generative AI models. Different models respond differently to the same prompts, and a sophisticated prompt optimized for Claude 3.5 Sonnet will likely need to be adapted to perform similarly with Llama 3.1 405B. And while the behaviors of frontier models may differ only slightly, their behaviors vary significantly from quantized models that have been adapted to run on local hardware like a laptop without an internet connection. These locally runnable models will be one of the keys to distributing the benefits of generative AI to people around the globe, and users must be able to adapt complex prompts designed for frontier models  to work with these offline models.

Revising and remixing existing prompts in order to optimize their performance in new user contexts and new model contexts is a form of prompt engineering. Inasmuch as many prompts will be copyrighted by default, legal permission to revise and remix them (to engage in the necessary prompt engineering) will be critical to ensuring the widespread impact and benefit of generative AI for everyone.

Open Weights

There are multiple efforts underway to define what constitutes an “open” generative AI model (White, et al., 2024; OSI, 2024). Rather than trying to contribute to that conversation, here I adopt a lowest common denominator definition focused only on model weights. “Open weights” are generative AI model weights that are licensed in a manner granting users permission to engage in the 5R activities.

While foundation models like GPT-4o, Claude, Gemini, and Llama have truly impressive general knowledge capabilities, their fine-tuning stages are generally designed to help them follow instructions accurately (c.f., the availability of “base” and “instruct” versions of models on HuggingFace). Foundation models are not designed to have pedagogical knowledge or to behave pedagogically. They may also lack the specialized knowledge needed by learners, teachers, and other users in some domains. Out of the box, these models are not optimized to support teaching and learning. Consequently, if generative AI is to meet its potential for supporting teaching and learning, users must be able to revise and remix the model weights directly.

The specific practices represented by some of the 5Rs change in the context of model weights. In the context of open weights, “revise” might refer to techniques like fine-tuning, where a model’s weights are updated through additional training on curated datasets. Fine-tuning allows targeted adjustments to be made to a model’s behavior, enabling it to perform specialized tasks. For example, an open weights model might be fine-tuned on a large dataset of interactions between learners and expert tutors, in order to make the model behave more like an expert tutor.

In the context of open weights, “remix” might involve techniques like model merging or model distillation. In model distillation a smaller “student” model is trained using the output of a larger “teacher” model. This process essentially compresses the larger model’s knowledge and capabilities into the smaller model. One of the best known examples here is DistilBERT, created by HuggingFace. The HuggingFace team distilled Google’s BERT model (the “teacher” model) into DistilBERT (the “student” model), reducing its size by 40% while retaining 97% of its language understanding capabilities and making it 60% faster. The smaller DistilBERT model can be run on a local device without access to the internet. (Again, the ability to run generative AI models locally is critical for a number of reasons, including promoting access in low-connectivity areas, protecting user privacy, and decentralizing and reducing energy consumption.)

An Acknowledgement

I want to acknowledge that there is definitely a technical leap that must be made for a person to go from editing a paragraph in a Pressbooks page to that same person fine-tuning an open weights model. It reminds me of when Javascript was first introduced into the web ecosystem (yes, I’m that old)…

Once upon a time, there was only HTML. Its markup language was pretty straightforward to learn and lots of people wrote HTML. Then came Javascript. It wasn’t a markup language – it was a programming language. And coding in Javascript required significantly more technical expertise than writing HTML did. But the trouble was worth it – HTML and Javascript combined in a synergistic way to make the web more interactive, more useful, and more powerful.

Admittedly, it will take more technical expertise to revise and remix generative OER like open weights models than it has taken to revise and remix traditional OER in the past. But it will be worth the trouble – traditional OER and generative OER will combine to create more interactive, more useful, and more powerful learning experiences.

Concluding Thoughts

In order to more fully leverage the potential of generative AI to make educational opportunity more accessible and more effective for learners everywhere, we need to apply the lessons learned over our 25+ years of work with OER. And the first lesson is this: open licensing – giving people permission to use their agency, enthusiasm, and creativity to engage in the 5R activities – unlocks human potential. And so we have to move beyond narrow thinking about how generative AI impacts our work with traditional OER and begin thinking more broadly about the power “generative OER,” in which we treat generative AI itself as an OER. As we engage with open prompts and open weights in the service of learning, we will open entirely new vistas of possibilities for teaching and learning – both formal and informal.