On the Relationship Between Adopting OER and Improving Student Outcomes

I’ve been writing this article 30 minutes here and 60 minutes there for several months (WordPress tells me I saved the first bits in March). I’ve probably deleted more than is left over. It’s time to click Publish and move on.

This article started out with my being bothered by the fact that ‘OER adoption reliably saves students money but does not reliably improve their outcomes.’ For many years OER advocates have told faculty, “When you adopt OER your students save money and get the same or better outcomes!” That claim is fine enough if your primary purpose is saving students money (which feels like the direction that OER and ZTC degree advocates have been moving for some time now, and explains why I don’t feel like I’m part of that community any more). But if your primary purpose is improving student outcomes, the shrugging “sometimes it works, sometimes it doesn’t” uncertainty is utterly unacceptable. So I’ve been thinking more than I’d care to admit about the relationship between OER and improving student outcomes. This thinking, with all the benefit that hindsight affords, doesn’t always reflect well on some of my earlier research. But that’s no reason not to share it.

Leveraging the “No Significant Difference” Effect for OER Advocacy

Back in 2020 I was invited to write a very brief piece about how OER research might inform pandemic practice. The more I dug into this question, the more dissatisfied I became with the answers. As I eventually wrote in Open educational resources: Undertheorized research and untapped potential:

Many of the articles reviewed in Hilton (2016), including some articles on which I was an author, are woefully undertheorized. They are essentially media comparison studies or, to be more precise, license comparison studies. Without conceptualizing an explanatory mechanism—a reason to believe a difference might exist—they simply compare the outcomes of students whose required course materials are openly licensed with those whose materials are traditionally copyrighted. A stronger theoretical framework, including a hypothesized explanatory mechanism, is required for comparative research to provide useful insights.

It should surprise no one that media comparison studies find no significant difference in student learning. Why would students who use a pencil learn more than students who use a pen? Why would students who read an openly licensed textbook learn more than students who read a traditionally copyrighted textbook? OpenStax Director of Research Philip Grimaldi and other authors asked the same questions in their 2019 article Do open educational resources improve student learning? Implications of the access hypothesis:

Why do most comparisons of OER to traditional materials fail to find a positive effect of OER? On one hand, the primary goal of OER is to offer an alternative to commercial textbooks that are comparable in quality, but free and openly licensed. Assuming an OER textbook is no different in quality, then there are no meaningful differences to explain effects on learning outcomes. License and cost certainly should not affect learning at a cognitive level. In this sense, the frequency of null effects is expected.

“There are no meaningful differences to explain effects on learning outcomes.” This is a theme in media comparison studies that has been repeated in the literature for decades. Over 20 years ago, Lockee, Burton, and Cross (1999) explained how the field of distance education was leveraging media comparison studies to its advantage in their article No comparison: Distance education finds a new use for ‘No significant difference’:

Media comparison studies have long been criticized as an inappropriate research design for measuring the effectiveness of instructional technology. However, a resurgence in their use has recently been noted in distance education for program evaluation purposes…. Stakeholders desire to prove that participants in distance-delivered courses receive the same quality of instruction off-campus as those involved in the “traditional” classroom setting. However, the desire to prove that the quality of such distributed offerings is equal to the quality of on-campus programming often results in comparisons of achievement between the two groups of student participants. Statistically, such a research design almost guarantees that the desired outcome will be attained—that indeed distance learners perform as well as campus-based students.

“Such a research design almost guarantees that the desired outcome will be attained.” Hence, the results of OER efficacy research are completely knowable before the first datum is collected:

Media comparison studies almost always result in a finding of “no significant difference.” And because “compare the outcomes of students who used openly licensed materials with those students who used traditionally copyrighted materials” is the very definition of a media comparison study, it is essentially guaranteed that there will be no significant difference in student learning in these studies.
OER are almost always less expensive than the traditionally copyrighted materials (TCM) they replace. Consequently, it is essentially guaranteed that students will save money when their faculty adopt OER instead of TCM in these studies.

OER efficacy studies are essentially guaranteed to result in a finding that says “Students saved money and learned just as much.” This finding is often presented as if it’s miraculous when, in fact, it is essentially foreordained.

Does our ability to have predicted the results of these studies ahead of time mean we shouldn’t have done them? I don’t think so. Most faculty have never heard of a media comparison study. Many faculty needed to see a tall stack of peer reviewed journal articles showing that adopting OER wouldn’t hurt student outcomes before they would consider adopting OER in their own classes. So these early media comparison studies had an important role to play in helping faculty accept OER. They had an important role to play in OER advocacy. But, if we’re honest with ourselves, they didn’t really contribute new knowledge to the field.

Why Would Adopting OER Change Student Outcomes?

Let’s take a step back for a moment. Why would a person believe that adopting OER in place of TCM would change learning outcomes for students? What is the mechanism by which students’ outcomes would change? How exactly would it work? What would make the difference?

The most popular hypothesis is that the affordability of OER creates more access for students, and that this increase in access is what will drive changes in outcomes. Grimaldi et al. call this “the access hypothesis.” We should pause here to note that adopting OER is not the only way to increase the affordability of required learning materials, thus activating the access hypothesis. Instead of adopting OER, faculty might choose to assign students any of the billions of resources which are freely available on the public internet with the same result vis-à-vis the access hypothesis. They might create their own materials, over which they retain traditional copyright, and provide these to students in their classes for free. They might assign resources from the campus library, which students experience as being “free” since the fee they pay in support of the library is bundled with their tuition. They might assign TCM through an inclusive access program which, like library resources, students experience as being “free” since the fee they pay is bundled with their tuition. Suffice it to say that OER are not unique in their ability to provide students with access to affordable, or even free, course materials. Consequently, if the access hypothesis is your proposed mechanism of action – the reason you expect to see student outcomes change – you’re not asking questions about OER. You’re asking questions about all affordable learning materials.

Returning to OER and the access hypothesis, there’s a critically important nuance to understand about access as a mechanism for improving outcomes that most research ignores. Grimaldi et al. explain, “the access hypothesis predicts that an OER intervention should only affect a subset of students—specifically those who would not otherwise have access to the textbook.” In other words, if I was already planning to purchase the textbook, giving me a free textbook might save me money but it won’t increase my access to course materials. And if it won’t improve my access to course materials, it can’t improve my learning outcomes.
Grimaldi et al. observe that researchers ostensibly working under the access hypothesis often mistakenly assume that everyone’s learning outcomes will improve because their faculty adopted OER:

The problem with this approach is that the effect of the intervention is washed out by students who are not expected to be affected by the intervention. To draw an analogy, the current research approach in OER is the equivalent of measuring the effect of a pain relieving drug on a sample of people who are mostly not in pain. In this sense, we should not expect to observe effects of an OER intervention, even if we believe that having access to a textbook is beneficial to learning.

To be clear – I don’t believe Grimaldi et al. are downplaying the benefit OER could provide to those students who would not have had access to their course materials otherwise. They’re just making the point that we need to be honest about how large this group is. According to data in the most widely cited survey, 26.5% of students say they “frequently” don’t purchase the required textbook, meaning that 26.5% of students would “frequently” have the potential to experience gains in learning attributable to greater access to course materials. For example, if you conduct OER efficacy research grounded in the access hypothesis in a “typical” class of 100 students, you should not expect 73 of them to see any learning benefit from OER (i.e., they won’t experience an increase in access that could lead to an increase in learning outcomes). In the context of a typical OER efficacy study design, the only way you’d detect a positive learning effect attributable to the greater accessibility of OER is for the impact on those 27 other students to be so incredibly large that it statistically significantly changes the average outcomes for all 100 students. That’s a pretty tall order. (You could also try to figure out who those 27 students are and look for differential effects across sub-groups. But, for many reasons, accurately identifying those students might be an even more complex task. It’s not one that many OER efficacy studies take on.)

Doesn’t Research Sometimes Show Positive Effects Associated with OER Adoption?

With all this talk of no significant difference and failing to detect effects, you may be saying to yourself, “Self, I’m sure I’ve read articles reporting studies where students who used OER had better outcomes than their peers who didn’t. Is he just going to pretend those studies don’t exist?” Not at all. I’m simply going to suggest that neither open licenses, nor cost savings, nor increased access to materials are responsible for those differences in student success.

When we reflect on the lessons learned from decades of media comparison studies, we shouldn’t expect to see a significant difference between students who use OER and those who use TCM. There’s no reason for us to expect that changing the copyright license of a work will impact student learning. And, as Grimaldi et al. demonstrate, research designs based on the access hypothesis are also unlikely to find positive effects on students. And yet, there are studies that show positive impacts associated with OER adoption. So… when studies find improvements in student outcomes, where are those improvements coming from? What are they attributable to?

Something else. Or, as the jargon goes, confounding variables.

What Are Some Potential Sources of Differences in Student Outcomes Seen in OER Research?

Of course I’m not the first to recognize the general lack of strong controls in OER research. Reviewing the efficacy studies included in Hilton’s first review (2016), Gurung (2018) writes, “All nine studies had major confounds such as method of instruction (e.g., comparing OER sections that were taught online or blended versus traditional texts used in a face-to-face class). Some studies switched exams between comparisons and some changed course design (e.g., went to a flipped model). Most study authors acknowledged that the type of textbook was not the only factor that changed.” How is one supposed to isolate the effect of OER on student learning when so many confounds are present yet unaccounted for? Clinton (2019) was more concise when she wrote, “Without control of the influences of demographics, prior academic achievement, and instructor, it is difficult to discern what the specific role of OER was on grades.” Difficult to discern indeed!

Below, I want to discuss three (of the many) potential confounding factors that often go uncontrolled in OER efficacy research, and which there are strong grounds to believe would impact student learning: the teacher, the support that teachers may receive when they adopt OER, and instructional design.

First, a pop quiz for you. Which has the greater effect on students’ outcomes – their teacher or their textbook? Of course the teacher has a far larger effect on student learning than their course materials do. For example, in Robinson et al. (2014), the standardized beta weight for OER effect was 0.03, while the beta for teacher effect was 0.21 – the effect of the teacher was seven times larger! This rather obvious point matters to us because in many studies of OER efficacy the faculty each individually chose whether or not to adopt OER (as academic freedom dictates they may). This means that, in many studies, the effect of OER on student success is perfectly confounded with the effect of the teacher on student success – they are completely inseparable. And given how much larger an effect the teacher has than the textbook does, studies that don’t control for teacher effect probably aren’t answering the question “how does students’ learning change when their faculty adopt OER?” They’re likely answering a question more akin to “how does students’ learning differ when they’re taught by the kind of faculty who are willing to experiment with teaching innovations (like OER) compared to students taught by the kind of faculty who aren’t?”

Second, consider that many of the faculty studied in OER research projects adopted OER as part of a department-wide, institution-wide, or sometimes even system-wide initiative like a ZTC degree program. These programs are designed to persuade faculty to adopt OER, as academic freedom allows them to choose whatever course materials they like. This persuasion comes in the form of incentives and supports provided to faculty to make the process of adopting OER less painful, and typically includes things like OER “mini-grants” ranging from a few hundred to a few thousand dollars. And / or a course release so there’s ample time to redesign the course. And / or support from an instructional designer to help with course redesign. And / or professional development on topics like backward design, active learning, or OER-enabled pedagogies. Some initiatives, like the original Z Degree at Tidewater Community College, even use professional development as a gateway – only faculty who had received special training were allowed to participate in TCC’s Z Degree initiative.

If the effect of the teacher on student learning is several times larger than the effect of course materials on student learning, is it surprising that the students of faculty who receive additional support and training would outperform the students of faculty who don’t? Not at all. In fact, I think a reasonable person would hypothesize that faculty who are better supported would be more effective teachers than faculty who receive less support. Unfortunately, research on OER effectiveness rarely describes in detail the extra support provided to faculty who adopt OER. It even less frequently identifies it as a factor that could potentially contaminate study results. There may be a study out there somewhere – it’s been a minute since I’ve been plugged in to the OER community – but I’ve never read OER research that controlled these potential confounds by providing those same mini-grants and other supports to faculty who didn’t adopt OER.

Once you become aware of these two confounds, you see them (missing) throughout the literature. On a closer reading, many OER efficacy studies that appear to show a positive effect of using OER actually show a positive of effect of taking classes from faculty who are willing to experiment with teaching innovations and are well supported by their institutions. That’s a genuinely interesting and useful finding, it’s just not the one that was advertised.

Finally, let’s talk about instructional design. Now, this is apparently somewhat controversial to say, but here goes: some approaches to designing learning materials are simply more effective than others. Similarly, some approaches to teaching are simply more effective than others. When an instructional designer appropriately integrates evidence-based strategies into the design of her learning materials, and when an instructor implements evidence-based practices in her teaching, we would have every reason to expect that students in that class would do better than students in another class chosen at random. Instructional design matters.

The original OLI-Statistics course comes to mind here, as reported by Lovett (2008). While this article is often included in reviews of research of OER efficacy, I think the author would be surprised by that fact. While it’s true that the course materials for OLI-Statistics were OER (licensed CC BY-NC-SA, if memory serves), the phrase “open educational resources” or OER isn’t used beyond the second sentence of the first paragraph in the article. Lovett didn’t go looking for better student outcomes because the OLI-Statistics learning materials had an open license or because they were free (i.e., “because the faculty adopted OER”), she went looking for better student outcomes because she believed that the OLI-Statistics learning materials and course structure had a dramatically more effective instructional design – which she actually describes in the article. And she was right.

Differences in pedagogical approach between course materials (and the courses they’re used in) are not only a valid reason to assume that outcomes will differ – they might be the primary reason we would assume that outcomes will differ. However, articles reporting research on OER rarely address the pedagogy enacted within the OER or the TCM being compared. This may be, in part, because the stated goal of so many OER projects is to create OER that look, sound, smell, and taste exactly like the TCM they are designed to displace. In effect, the creators of the OER have pre-controlled for instructional design as a potential confound in future efficacy studies by copying the design of the original TCM so closely. This makes open textbooks easy for faculty to understand and adopt – they look exactly like the TCM they’re replacing – but it also decreases the likelihood that there will be any improvement in student outcomes attributable to improved instructional design.

Opportunities Going Forward

OER researchers have an opportunity to be more thoughtful about the questions that drive their research. Specifically, researchers looking for a difference in outcomes between students whose faculty adopt OER and other students whose faculty adopt TCM should explicitly describe the mechanism that they hypothesize will cause the difference in outcomes they are looking for. Why, exactly, are we expecting there to be a difference? What is the mechanism that explains that difference? Being clear about the hypothesized explanatory mechanism would make research relating to OER efficacy significantly more useful.

Second, OER researchers have an opportunity to be more thoughtful about the way they control potential confounds. And journal reviewers and editors have an opportunity to push back more on researchers when they don’t. If we are honest with ourselves, how much can we actually learn about the impact of OER on students success when neither course discipline, course format, teacher, pedagogy, student demographics, or institutional supports are controlled? Even if this lack of control is spelled out clearly in a “Limitations” section near the end of the article?

Finally, rather than expecting different copyright licenses or the access hypothesis to dramatically improve student outcomes, OER researchers have an opportunity to turn their attention to things that actually can make a difference in student outcomes – things like more effective instructional designs, evidence-based teaching, and faculty supports like professional development. “But wait,” you might protest, “if they turned their focus to those things, they wouldn’t really be doing OER research any more!” 😉

Appendix

For the reader who wants to dig a little deeper, here are some examples of OER efficacy studies that do a better job of controlling for potential confounds. They still have room for improvement, but it appears that you only have to tighten up the controls a little bit for the supposed effects of OER adoption to evaporate.

One example of a study with stronger controls is Allen, Guzman-Alvarez, Molinaro, and Larsen (2015). In this study one group of students used OER and another used TCM. Both groups were taught by the same instructor. Both groups were supported by the same teaching assistants. The groups were taught in back to back time-slots on the same days of the week. Both groups took the same mid-terms and same final exam. The study found no difference in midterm or final exam performance, no difference in learning gains (calculated using a set of 35 questions given during the first week of class and again as part of the final exam), and no difference in student attitudes toward or beliefs about the course topic.

Another example of a study with stronger controls is Robinson, Fischer, Wiley, and Hilton (2014). In this study public school students in a Utah school district used either OER or TCM as the primary text for their science classes. Students from previous years in the same teachers’ classrooms served as controls for students in the classroom the year of the study. This study used propensity score matched groups as well as a multiple regression model that controlled for the effects of ten student and teacher covariates. The study examined the difference in students’ test scores on the end-of-year state standardized tests in Biology, Earth Science, and Chemistry. The analysis found no difference in outcomes for students in Biology and Earth Science. In Chemistry, the study found a difference that was statistically significant but of “limited educational significance” – a difference of 1.2 points out of 192 points (a difference of about one half of one percent).

Another example of a study with stronger controls is Winitzky-Stephens and Pickavance (2017). They used a multilevel modeling approach to control for the effect of student, instructor, and course on student learning: “For the purposes of the current analysis, we have three levels, which are modeled as random effects: student, instructor, and course. At the student level, we can control for demographic characteristics such as age, gender, and race, as well as past performance such as accumulated credits and overall GPA. We also include student as a random effect because each student may enroll in more than one course. The effect of individual instructors can be captured at the next level, along with instructors’ choice to assign an OER or traditional text. At the class level, we can control for the effect of individual course, course level (development education, 1000- or 2000-level), and subject (e.g., Math, Writing, English, Biology).” The study found no significant difference in learning for “continuing students” – students for whom prior GPA could be included in the model. As you might imagine, prior GPA (how well students performed in previous classes) is the strongest predictor of how well students will perform in future classes. And prior GPA did have the strongest predictive power for all three outcomes they studied – final grade, likelihood of passing, and likelihood of withdrawing. The study found a significant difference for “new students” who used OER – but this appears to be an artifact due to the fact that “new students” have no prior GPA, and so the most powerful predictor was missing in the new students’ model.

Another example of a study with stronger controls is the final report of the Achieving the Dream OER Degree Initiative. This study, designed by SRI, looked at the impact on academic outcomes of over 60,000 students from 11 institutions who enrolled in courses where faculty adopted OER, comparing their outcomes to the outcomes of students whose faculty used TCM. They used propensity score matched groups and OLS regression for their analyses, including students’ prior achievement, demographic variables, and transcript variables as controls in their models. The analysis found no significant difference in GPA for students using OER compared to those using TCM, and a tiny effect (Hedge’s g = 0.18) on credit accumulation. Remember, the rule of thumb for interpreting Hedge’s g is that effects ranging from 0.2 to 0.5 are considered “small”.

As a bonus, here’s a recent study by Spica (2021) with a very strong design that examined whether inclusive access programs that provide students with “day one access” improved student outcomes. (As I described above, the access hypothesis applies to inclusive access programs as well as OER.) She compared DFW rates for 88,946 students across 13 community colleges and 141 courses during an inclusive access pilot semester against two previous fall semesters in which the same courses were taught. Using a hierarchical linear regression model to control for the effects of course, institution, and semester, she looked for impacts on students disaggregated by race/ethnicity, Pell grant receipt, and Adult Learner status. She found no significant difference in outcomes. Writing in the Discussion section, she summarized, “These findings suggest that Day One access alone is insufficient to produce significant gains in DFW rates, even for at-risk populations deemed most likely to benefit.”