S3: A Holistic Framework for Evaluating the Impact of Educational Innovations (Including OER)

This fall I’m once again teaching IPT 531: Introduction to Open Education at BYU (check it out – it’s designed so anyone can participate) and today I’m beginning a pilot run-through of the course redesign with a small number of students. I wanted to include a reading summarizing my current thinking on ‘evaluating the impact of OER’ in the course, so I’m letting some thoughts spill out below. This framework will be continuously improved over time.

In the past I’ve written frequently about how we evaluate the impact of OER use. These writings included ideas like the golden ratio (2009), the OER impact factor (2014), and thinking more broadly about impact (2018). Today I want to pull many of these thoughts together into a holistic, unified framework for measuring the impact of OER use (and other educational innovations). This new framework has three components: success, scale, and savings – hence the name S3. Below I’ll define each component, describe how to calculate it’s value, and describe how to aggregate the individual values into an overall score. I’ll then describe the applicability of the framework to evaluating educational innovations beyond OER.

Defining and Measuring Success

Student success is the first and most important component of the framework. In the S3 framework, “success” means “completing a course with a final grade that allows the course to count toward graduation.”

Research studies evaluating the impact of OER frequently report whether or not the grades received by students changed after their faculty began using OER. In the majority of cases the answer to this question is “no,” for reasons described by Grimaldi and his colleagues at OpenStax. In those cases where changes in final grade have been reported (by myself and others), they have often been reported as change in final grade percentage (86% versus 84%) or in GPA units (2.4 versus 2.6). While these changes are occasionally shown to be statistically significant, it is difficult to interpret their practical significance – because in almost every case (that I’m aware of) students who received the “higher” final grade actually appear to have received the same letter grade. In other words, a typical finding among the few papers that show a grade benefit associated with OER use is that while control students earned Cs on average, treatment students earned slightly higher Cs on average. This is what I mean when I say it’s difficult to determine the practical impact of such an “improvement” – GPAs likely don’t change in this scenario, meaning that students won’t qualify for scholarships at higher rates, improve their chances of getting into graduate school, or notice any other practical benefit.

The place where impact occurs most clearly is around the boundary between C and D final grades. Here a small change in final grade makes the difference between having to retake the course (i.e., paying tuition again, delaying graduation by another semester, possibly paying for the course materials again, etc.) or being able to count the course toward graduation. While the practical impact of the difference between a C and a slightly higher C is difficult to interpret, the difference between a D and a C is as wide as the ocean. Some research in OER has already begun using final grades of “C or better” (or the inverse, the DFW rate) as the measure of interest (e.g. Hilton, et al.) and more OER impact research should follow that lead.

In the S3 framework, success is measured using the C or better rate from before OER were used (an average of multiple previous terms is a more stable measure than the single prior term) and the C or better rate after OER began being used (again, an average of multiple terms is a more desirable measure than a single term) as follows:

$$success\;=\;\frac{C\;or\;better_{OER}\;-\;C\;or\;better_{Control}}{\;1\;-\;C\;or\;better_{Control}}$$

The maximum value for success is 1, and this occurs only when the C or better rate is 1 for OER users. This encodes both the idea and the goal that each and every student should succeed in the course. The value is undefined when the C or better rate is 1 for control students, as it isn’t possible to improve success in this case. You can explore the full range of values here.

Defining and Measuring Scale

Scale is the next most important component in the S3 framework. Scale means “the proportion of students being reached.” If work that is highly impactful in a single setting cannot be scaled to multiple classrooms, it does little to advance the cause of improving success for each and every student. For example, supplementing a classroom instructor with an instructional designer, a researcher, and a pair of graduate students may allow incredible things to happen in that classroom, dramatically increasing the success of the students in the class. However, if there’s no practical way to adapt this model to other classrooms, we can’t use this model to help improve success for all students.

In the S3 framework, scale is measured using the number of students in sections of courses using OER (e.g., the number of students in sections of Intro to Psychology using OER) and the total number of students in all sections of those same courses (e.g., the total number of students in all sections of Intro to Psychology), as follows:

$$scale\;=\;\frac{Number\;of\;students_{OER\;Sections}}{Number\;of\;students_{All\;Sections}}$$

The maximum value for scale is also 1, and this occurs only when all relevant course sections are using OER. This encodes both the idea and the goal that each and every student should be included. Scale can include a single course (e.g., Intro to Psychology) or multiple courses (e.g., all general education courses), depending on where OER is being used at an institution.

Defining and Measuring Savings

Savings is the final component in the S3 framework. Savings means, “the amount of money spent on course materials by the average student using OER compared to the amount of money spent on course materials by the average control student.” When calculated accurately, the savings measure takes into account several factors:

  • The materials assigned to control students are available at many price points
  • Some control students don’t spend any money on course materials
  • Some OER students spend money on printed copies of OER (or on printing OER)
  • Some OER students spend money on courseware or homework systems
  • Printed copies of OER frequently cost as much or more than courseware or homework systems (e.g., see the prices of printed OpenStax books on Amazon)

Contrary to popular belief, savings is very difficult to measure accurately and some estimation and guesswork is almost always involved.

In the S3 framework, savings is measured using the average amount of money spent by OER users and the average amount of money spent by control students, as follows:

$$savings\;=\;\frac{Average\;amount\;spent_{Control}\;-\;Average\;amount\;spent_{OER}}{Average\;amount\;spent_{Control}}$$

The maximum value for savings is also 1, and this occurs only when no student who was assigned OER spends any money on course materials. This value is undefined when no students in the control group spend any money on course materials, as it isn’t possible for OER users to save money in this case.

Calculating an Overall Score

In aggregating the individual scores into an overall score, we must consider the amount each individual component will contribute to the overall score.

I’ve argued above that success is the most important component of the three in this framework. As I wrote about at length in Taking Our Eye Off the Ball, I believe it is a huge mistake for us to look at 30% graduation rates from US community colleges and say, “the most important thing we can do is make that abysmal outcome less expensive.” (Likewise for the 60% graduation rate from US universities.) Making a 70% failure rate (or a 40% failure rate) more affordable is not the most important work we can do. We have to begin making meaningful progress on student success, and it should be weighted most heavily of the three components in the framework.

I’ve argued above that scale is the next most important component of the framework. Affordability (or savings) is one of the characteristics of a scalable innovation, as students can’t benefit from something they can’t afford. But it takes a lot more to make an educational innovation scale successfully than simply being affordable (or even free) – like how attractive it is to faculty or how easy or hard it is to implement successfully. Inasmuch as scale includes much savings and a range of other factors, it should be weighted more heavily than savings alone.

That said, I think it is important to continue to include savings as a standalone measure, if for no other reason than the historical importance of cost savings in the research on the impact of OER. But more importantly, including savings as a standalone measure of impact helps us remember the problems and mistakes of the past (and present) with regards to the pricing of course materials. Hopefully, keeping these errors front and center will decrease the chances that new innovations will travel down that path again.

As I have pondered relative weights for the components, it seems that success is at least twice as important as scale, and that scale is at least twice as important as savings. If we use:

$$impact\;=\;4\times success\;+\;2\times scale+1\times savings$$

We get a measure of impact with a maximum value of 7. I like that.

Extending the Framework to Educational Innovations Beyond OER

Viewed from a high level, OER is just one of hundreds of innovations whose proponents say has the potential to improve education. To the degree that is true, there is no reason to develop an OER-specific measure of impact. Indeed, it would be eminently useful to have a common measure of impact we could use to compare a wide range of would-be innovations.

There is nothing OER-specific about S3. You could use it to measure the impact of “things” like learning analytics or iPads or augmented reality. You could use it to measure the impact of “approaches” like problem-based pedagogies or collaborative problem solving pedagogies or active learning pedagogies. The questions S3 keeps us focused on are:

  1. How much does this innovation improve student success?
  2. How many of our students are benefiting from this innovation?
  3. How much money does this innovation save students?

These are all good questions to ask.

The Revisability Paradox

Long-time readers will be familiar with “learning objects” and the “reusability paradox.” If you’ve been working in educational technology since the 1990s, you might want to skip the first section below. Or you may find it a sentimental walk down memory lane.

Learning objects and the reusability paradox

A learning object is “any digital resource that can be reused to support learning” (Wiley, 2000), and the goal of the learning objects movement was to design learning materials that were sufficiently small and self-contained as to be easily reused across many different learning contexts. Remember the joy of digging into a bin of Legos, pulling out random pieces and assembling them into whatever your heart fancied? This was the promise of learning objects, which were compared to Legos in almost every conference presentation and journal article on the topic.

The reusability paradox describes a difficulty at the heart of the learning objects idea. Here’s how I described it in the late 1990s:

1. The “bigger” a learning object is, the more (and the more easily) a learner can learn from it. For example, there’s only so much you can learn by studying a single photograph of a mountain (an image is a “small” learning object). On the other hand, you can learn quite a lot from a chapter on mountain formation, with multiple images, animations, and explanatory text (a chapter is a “large” learning object).

2. The “bigger” a learning object is, the fewer places it can be reused. For example, a single image of a mountain can be placed into a wide range of learning materials (e.g., it could be embedded in chapters about geography, history, photography, etc.). On the other hand, there are only so many places you can reuse an entire chapter on mountain formation.

To state it briefly, there is an inverse relationship between reusability and pedagogical effectiveness.

Because pedagogical effectiveness and potential for reuse are completely at odds with one another, the designer of learning objects faces a difficult choice. They can either (1) design smaller objects that are easier to reuse but require significantly more effort and assembly on the part of the instructor before they are useful for learning, or (2) design larger objects that more effectively support learning but have limited potential for reuse.

(If you’ve ever thought my writing was insufferably pedantic before, I promise you ain’t seen nothin’ yet. Try this detailed elucidation of the reusability paradox from 2002.)

The modern reader will no doubt scratch their head and ask, “why not just start with a larger, more useful learning object and adapt it to meet your needs?” The answer, of course, is that in the late 90s and early 00s the open content movement was only just beginning. We didn’t have the 5Rs back then. The book on learning objects I linked to above was published under the Open Publication License because Creative Commons didn’t even exist yet. The universal (and universally true) assumption was that learning objects were traditionally copyrighted, meaning you had to reuse them exactly as you found them (just like Legos).

OER and the Revisability Paradox

That bit of history prepares us to discuss open educational resources (OER) and the revisability paradox.

Open educational resources are teaching, learning, and research materials that are either (1) in the public domain or (2) licensed in a manner that provides everyone with free and perpetual permission to engage in the 5R activities. I don’t believe readers of this blog need much additional context about OER.

The revisability paradox describes a difficulty at the heart of the OER idea. Here’s my current best attempt at explaining it:

1. The more research-based instructional design is embedded within an open educational resource is, the more (and the more easily) a learner can learn from it. For example, there’s only so much you can learn by reading an explanation of what a conifer is (a “simple design” OER). On the other hand, you can learn quite a lot from an activity that (1) isolates and explicitly describes the critical attributes that separate instances of conifers from non-instances and (2) provides you with the opportunity to practice classifying trees as instances or non-instances, coupled with immediate, targeted feedback (a “research-based design” OER).

2. The more research-based the instructional design of a OER is, the harder it is to revise and remix without hurting its effectiveness (that it, the more instructional design expertise is necessary to revise and remix effectively). For example, many different kinds of changes could be made to a simple explanation of conifers without changing its effectiveness in supporting student learning. On the other hand, there are many ways of changing the research-based design that would cause it to be no more effective than the simple explanation.

In essence, without instructional design expertise, looking at well designed learning resources is like watching a sporting event whose rules and nuances you don’t understand. Remember watching hockey/baseball/soccer/cricket for the first time? Remember the first time you watched with someone who deeply understood the game, and you started to realize how much you were missing – even though you were both watching the same game? It’s like that, but with the research on supporting learning effectively. Without instructional design expertise it’s easy to look at something like the explicit isolation of critical attributes and think “Boring! I have a way more interesting way of explaining that!” …and we’re right back to explaining.

In other words, there is an inverse relationship between revisability and pedagogical effectiveness.

Implications

One of the amazing things about the OER movement is how the “OER way of thinking” has democratized access to the creation of learning materials. Anyone with domain expertise and a word processor or WordPress instance can write definitions, descriptions, and explanations. This won’t result in particularly effective learning materials, but it will result in OER that look a lot like their traditionally copyrighted counterparts, are less expensive than their traditionally copyrighted counterparts, and that anyone can revise or remix without doing any damage.

The Jenga by Ed Garcia from https://flic.kr/p/4GW2c2 is licensed CC BY.

Revising or remixing OER with a research-based instructional design is much more like playing Jenga blindfolded. When you don’t fully understand the instructional functions of the different elements of the learning materials, there’s no way to know whether pulling one out or swapping it for something else or changing it in some other way will cause the whole efficacy tower to collapse. And the biggest problem, of course, is that when you destroy the efficacy tower you don’t know you did – because you don’t know the rules of the game and can’t really see what’s happening.

Choices… and a Question

The designer of learning objects can either (1) design smaller objects that are easier to reuse but require significantly more effort and assembly on the part of the instructor before they are useful for learning, or (2) design larger objects that more effectively support learning but have limited potential for reuse.

Likewise, the designer of open educational resources can either (1) create “simple OER” – resources with rudimentary instructional designs that aren’t particularly effective at supporting student learning but are easy to revise and remix without decreasing their effectiveness, or (2) create “complex OER” – resources using research-based instructional designs that are far more difficult to revise and remix without decreasing their effectiveness (i.e., they’re “easy to break”).

Which leads me to wonder… What is the role of instructional design / learning science / learning engineering / related forms of expertise in the creation – or revising and remixing – of learning materials? Insisting that this expertise is important feels like it pulls against the democratizing power of modern conceptions of openness in education. But denying that this expertise matters feels like it joins the broader anti-expertise chorus currently eroding public policy.

So… now what?

Comments on the US DoEd Proposed Rule – Open Textbook Pilot Program

I submitted the following comment today on the Department of Education’s proposed rule “Open Textbook Pilot Program.” The deadline to submit a comment is April 30, so read the rule and get your comments in soon.

It may surprise readers to find me arguing against requirements of openness in some of my comments. But in the spirit of “pragmatism before zeal,” I argue in my comments specifically against three unintended consequences of open requirements as they pertain to LMSs, efficacy research, and assessment security / adoptability. It is true that, if the Department acts on my first two comments below, there are aspects of the work that might otherwise have been open that will not end up being open. However, if the Department does act on these comments, the parts of the work that are open will be more widely adopted, will result in more students saving more money, and, most importantly, will result in more students learning more.


To Whom It May Concern:

My name is David Wiley. Please allow me to say a few words regarding my qualifications for commenting on the Open Textbook Pilot Program. I hold a PhD in Instructional Psychology and Technology from Brigham Young University. My publications about open education have been cited over 4000 times (https://scholar.google.com/citations?user=M47HR7IAAAAJ). I am the author of the 5Rs framework (https://openeducationalresources.org/) that many colleges and universities use to frame their open education initiatives. I am the founder of the Open Education Conference, which recently met for its 16th annual convening. I previously ran a major university research center dedicated to open education (the Center for Open and Sustainable Learning at Utah State University) and I am currently the Chief Academic Officer of Lumen Learning. In calendar 2019, Lumen Learning directly supported over 250,000 students in using open educational resources. We estimate that these students collectively saved over $30M. Another 75 million learners freely accessed the open educational resources on our website in 2019.

In this letter I am consolidating previous comments I have made on this program as well as adding new comments.

The Definition of “Open Textbook”

The proposed definition of “open textbook” reads, in part:

“An open textbook may also include a variety of open educational resources or materials used by instructors in the development of a course and those learning activities necessary for successful completion of a course by students. These include any learning exercises, technology-enabled experiences (e.g., simulations), and adaptive support and assessment tools.”

The text is unclear regarding whether or not the tools that provide adaptive support, the tools that provide assessment capabilities, and any other tools that might be used to store, manage, deliver, augment, or support “open textbooks” must also be openly licensed. The text is also unclear regarding whether or not these tools must be made available to students for free. It is absolutely critical that the rule clarify these questions.

Specifically, it is critically important that the answer to these questions be “no.” The overwhelming majority of “open textbooks” used in US higher education today are delivered to students via learning management systems (LMSs) like Blackboard, Canvas, and Desire2Learn. Grant recipients will rightly want to continue this practice. However, the majority of LMSs are (1) not openly licensed and (2) have hosting, maintenance, and other associated costs that institutions frequently pass on to students (e.g., as part of a mandatory technology fee). Requiring grant recipients to use only openly licensed or freely available tools to support the delivery, usage, maintenance, and support of “open textbooks” will effectively prohibit awardees from using their own learning management systems to offer classes with “open textbooks.” This would be a horrible consequence.

It is also true that the overwhelming majority of non-LMS technology platforms that provide adaptive, assessment, and other complementary capabilities are neither openly licensed nor freely available. Requiring all tools used in conjunction with “open textbooks” to be openly licensed or freely available will also prohibit awardees from leveraging a wide range of teaching and learning capabilities in conjunction with their “open textbooks”. It would also be quite curious, given that the primary technology platforms used by previous awardees under this program – namely, LibreTexts and Smart Sparrow – are both proprietary technology platforms.

A prohibition on using proprietary technology platforms in conjunction “open textbooks” would also sabotage Proposed Priority 3(b), which requires awardees to evaluate the impact of open textbooks on learning outcomes and course outcomes. When the treatment group of students are using “open textbooks” that are required to be essentially static content (like a PDF) delivered outside of the campus learning management system, and the control group are using equivalent content integrated within the LMS and complemented by advanced adaptive, assessment, and other capabilities, the impact of “open textbooks” on student outcomes is all but guaranteed to be negative. This would also be a horrible consequence of the proposed rule.

In conclusion, the proposed rule should be amended to clearly state that, for the purposes of the grant, the tools used in conjunction with “open textbooks” are not required to be either openly licensed or freely available.

Licensing of Ancillary Resources

The proposed rule mentions “ancillary learning resources,” “ancillary instructional materials,” and “ancillary materials” but does not define any of these terms. While instructional content produced under the grant must be made available under a “worldwide, non-exclusive, royalty-free, perpetual, and irrevocable license to the public to exercise any of the rights under copyright conditioned only on the requirement that attribution be given as directed by the copyright owner,” the licensing status of ancillaries is never addressed directly in the proposed rule. It must be.

Presumably, the category of ancillaries includes individual assessment items, assessment banks, complete quizzes, homework problems, assignments, rubrics, and model answers. It is critically important that there NOT be a requirement for assessments designed to measure student learning to be released under the same open licensing terms as instructional content. If assessments designed to measure student learning (hereafter, “assessments”) are required to be openly licensed, the department’s investment in their creation will be wasted within a matter of months.

Assessments from a wide range of commercial publishers and OER providers inevitably end up on cheating websites where students share questions and answers with one another. Once that happens, all a student has to do is type a partial question into Google and they can immediately find correct answer information. When assessments are traditionally copyrighted, a takedown notice can be issued to a website publicly sharing copyrighted assessments. However, when assessments are openly licensed, a cheating website is within its rights to continue publishing homework and quiz answers. While the game of whack-a-mole with various cheating sites that publish copyrighted assessments can be time consuming, copyright at least makes it possible to demand that assessments are taken down. When assessments are openly licensed, there is no recourse for the assessment creator. In other words, the open licensing of assessments undermines assessment security.

Faculty frequently spot check questions in assessment banks to see if the answers are available to students online. If they find the answers online, faculty know they can’t use those assessments in their courses. Inasmuch as the availability of ancillaries like assessments is a major factor in faculty decisions to adopt “open textbooks,” the goals of the proposed rule would be served greatly by encouraging the creators of assessments to maintain traditional copyrights on those assessments. The language of the proposed rule should make it clear that they are permitted to do so.

In conclusion, the proposed rule should be amended to clearly state that, for the purposes of the grant, the ancillary resources created in conjunction with “open textbooks” are not required to be openly licensed.

Technical Assistance Providers

As currently written, an “eligible applicant” is a consortium comprised of “at least” IHEs, a single educational technology or curriculum design expert, and an advisory group of “sector partners.” While the language “at least” in the definition of “eligible applicant” does not rule out the inclusion of organizations that provide technical assistance in consortia, the proposed rule should be amended to specifically state that “technical assistance providers” are permitted to be members of consortia. Technical assistance providers have specialized expertise that will likely be valuable to grantees under the program. For example, Creative Commons, the organization that creates the copyright licenses used by the overwhelming majority of “open textbooks,” has previously provided technical assistance to recipients of federal grants as awardees have worked to comply with open licensing requirements. There is no reason to discriminate between non-profit and for-profit entities in the provision of technical assistance to grantees.

In conclusion, the proposed rule should be amended to make clear that consortia with technical assistance providers as members are “eligible applicants,” and that both non-profit and for-profit entities are eligible to serve as technical assistance providers.

 

Thank you for the opportunity to comment on the proposed rule. I look forward to the Department following Congressional direction this year in awarding a large number of smaller grants under the Open Textbooks Pilot Program.

Yours,

David Wiley, PhD