S3: A Holistic Framework for Evaluating the Impact of Educational Innovations (Including OER)

This fall I’m once again teaching IPT 531: Introduction to Open Education at BYU (check it out – it’s designed so anyone can participate) and today I’m beginning a pilot run-through of the course redesign with a small number of students. I wanted to include a reading summarizing my current thinking on ‘evaluating the impact of OER’ in the course, so I’m letting some thoughts spill out below. This framework will be continuously improved over time.

In the past I’ve written frequently about how we evaluate the impact of OER use. These writings included ideas like the golden ratio (2009), the OER impact factor (2014), and thinking more broadly about impact (2018). Today I want to pull many of these thoughts together into a holistic, unified framework for measuring the impact of OER use (and other educational innovations). This new framework has three components: success, scale, and savings – hence the name S3. Below I’ll define each component, describe how to calculate it’s value, and describe how to aggregate the individual values into an overall score. I’ll then describe the applicability of the framework to evaluating educational innovations beyond OER.

Defining and Measuring Success

Student success is the first and most important component of the framework. In the S3 framework, “success” means “completing a course with a final grade that allows the course to count toward graduation.”

Research studies evaluating the impact of OER frequently report whether or not the grades received by students changed after their faculty began using OER. In the majority of cases the answer to this question is “no,” for reasons described by Grimaldi and his colleagues at OpenStax. In those cases where changes in final grade have been reported (by myself and others), they have often been reported as change in final grade percentage (86% versus 84%) or in GPA units (2.4 versus 2.6). While these changes are occasionally shown to be statistically significant, it is difficult to interpret their practical significance – because in almost every case (that I’m aware of) students who received the “higher” final grade actually appear to have received the same letter grade. In other words, a typical finding among the few papers that show a grade benefit associated with OER use is that while control students earned Cs on average, treatment students earned slightly higher Cs on average. This is what I mean when I say it’s difficult to determine the practical impact of such an “improvement” – GPAs likely don’t change in this scenario, meaning that students won’t qualify for scholarships at higher rates, improve their chances of getting into graduate school, or notice any other practical benefit.

The place where impact occurs most clearly is around the boundary between C and D final grades. Here a small change in final grade makes the difference between having to retake the course (i.e., paying tuition again, delaying graduation by another semester, possibly paying for the course materials again, etc.) or being able to count the course toward graduation. While the practical impact of the difference between a C and a slightly higher C is difficult to interpret, the difference between a D and a C is as wide as the ocean. Some research in OER has already begun using final grades of “C or better” (or the inverse, the DFW rate) as the measure of interest (e.g. Hilton, et al.) and more OER impact research should follow that lead.

In the S3 framework, success is measured using the C or better rate from before OER were used (an average of multiple previous terms is a more stable measure than the single prior term) and the C or better rate after OER began being used (again, an average of multiple terms is a more desirable measure than a single term) as follows:

[mathjax] $$success\;=\;\frac{C\;or\;better_{OER}\;-\;C\;or\;better_{Control}}{\;1\;-\;C\;or\;better_{Control}}$$ [/mathjax]

The maximum value for success is 1, and this occurs only when the C or better rate is 1 for OER users. This encodes both the idea and the goal that each and every student should succeed in the course. The value is undefined when the C or better rate is 1 for control students, as it isn’t possible to improve success in this case. You can explore the full range of values here.

Defining and Measuring Scale

Scale is the next most important component in the S3 framework. Scale means “the proportion of students being reached.” If work that is highly impactful in a single setting cannot be scaled to multiple classrooms, it does little to advance the cause of improving success for each and every student. For example, supplementing a classroom instructor with an instructional designer, a researcher, and a pair of graduate students may allow incredible things to happen in that classroom, dramatically increasing the success of the students in the class. However, if there’s no practical way to adapt this model to other classrooms, we can’t use this model to help improve success for all students.

In the S3 framework, scale is measured using the number of students in sections of courses using OER (e.g., the number of students in sections of Intro to Psychology using OER) and the total number of students in all sections of those same courses (e.g., the total number of students in all sections of Intro to Psychology), as follows:

$$scale\;=\;\frac{Number\;of\;students_{OER\;Sections}}{Number\;of\;students_{All\;Sections}}$$

The maximum value for scale is also 1, and this occurs only when all relevant course sections are using OER. This encodes both the idea and the goal that each and every student should be included. Scale can include a single course (e.g., Intro to Psychology) or multiple courses (e.g., all general education courses), depending on where OER is being used at an institution.

Defining and Measuring Savings

Savings is the final component in the S3 framework. Savings means, “the amount of money spent on course materials by the average student using OER compared to the amount of money spent on course materials by the average control student.” When calculated accurately, the savings measure takes into account several factors:

The materials assigned to control students are available at many price points
Some control students don’t spend any money on course materials
Some OER students spend money on printed copies of OER (or on printing OER)
Some OER students spend money on courseware or homework systems
Printed copies of OER frequently cost as much or more than courseware or homework systems (e.g., see the prices of printed OpenStax books on Amazon)

Contrary to popular belief, savings is very difficult to measure accurately and some estimation and guesswork is almost always involved.

In the S3 framework, savings is measured using the average amount of money spent by OER users and the average amount of money spent by control students, as follows:

$$savings\;=\;\frac{Average\;amount\;spent_{Control}\;-\;Average\;amount\;spent_{OER}}{Average\;amount\;spent_{Control}}$$

The maximum value for savings is also 1, and this occurs only when no student who was assigned OER spends any money on course materials. This value is undefined when no students in the control group spend any money on course materials, as it isn’t possible for OER users to save money in this case.

Calculating an Overall Score

In aggregating the individual scores into an overall score, we must consider the amount each individual component will contribute to the overall score.

I’ve argued above that success is the most important component of the three in this framework. As I wrote about at length in Taking Our Eye Off the Ball, I believe it is a huge mistake for us to look at 30% graduation rates from US community colleges and say, “the most important thing we can do is make that abysmal outcome less expensive.” (Likewise for the 60% graduation rate from US universities.) Making a 70% failure rate (or a 40% failure rate) more affordable is not the most important work we can do. We have to begin making meaningful progress on student success, and it should be weighted most heavily of the three components in the framework.

I’ve argued above that scale is the next most important component of the framework. Affordability (or savings) is one of the characteristics of a scalable innovation, as students can’t benefit from something they can’t afford. But it takes a lot more to make an educational innovation scale successfully than simply being affordable (or even free) – like how attractive it is to faculty or how easy or hard it is to implement successfully. Inasmuch as scale includes much savings and a range of other factors, it should be weighted more heavily than savings alone.

That said, I think it is important to continue to include savings as a standalone measure, if for no other reason than the historical importance of cost savings in the research on the impact of OER. But more importantly, including savings as a standalone measure of impact helps us remember the problems and mistakes of the past (and present) with regards to the pricing of course materials. Hopefully, keeping these errors front and center will decrease the chances that new innovations will travel down that path again.

As I have pondered relative weights for the components, it seems that success is at least twice as important as scale, and that scale is at least twice as important as savings. If we use:

$$impact\;=\;4\times success\;+\;2\times scale+1\times savings$$

We get a measure of impact with a maximum value of 7. I like that.

Extending the Framework to Educational Innovations Beyond OER

Viewed from a high level, OER is just one of hundreds of innovations whose proponents say has the potential to improve education. To the degree that is true, there is no reason to develop an OER-specific measure of impact. Indeed, it would be eminently useful to have a common measure of impact we could use to compare a wide range of would-be innovations.

There is nothing OER-specific about S3. You could use it to measure the impact of “things” like learning analytics or iPads or augmented reality. You could use it to measure the impact of “approaches” like problem-based pedagogies or collaborative problem solving pedagogies or active learning pedagogies. The questions S3 keeps us focused on are:

How much does this innovation improve student success?
How many of our students are benefiting from this innovation?
How much money does this innovation save students?

These are all good questions to ask.