open content

On the Impossibility of the Community-based Production of Learning Content

UPDATE: I borrowed the “community based” language in the title of this post from Martin’s blog, which reminded me of Yochai’s article and prompted this post. That language has caused confusion on social media. (Long-time readers of this blog will be surprised to learn that definitions matter!) I should have used Yochai’s language of “peer production of educational materials” from the start. Perhaps that would have headed off some of the misunderstanding on Twitter. Perhaps.

In a post this morning, Martin wrote, “We’ve still not really cracked a community based production model for learning content.” It got me thinking.

Back in 2005 I was blessed with the opportunity to commission a short paper from Yochai Benkler (who did much of the first serious work on the economics of open source software development, e.g., Coase’s Penguin and Sharing Nicely) in conjunction with a keynote talk he gave at OpenEd that year. The paper he produced, Common Wisdom: Peer Production of Educational Materials, is what I believe to be one of the most important and least known writings in the first, formative decade of open education as we know it today.

Benkler’s main argument in the paper focuses on the relationship between modularization and integration. (The argument may sound familiar to readers who have encountered the reusability paradox.) He points out that the number of people who will volunteer to contribute to a project is directly proportional to how small the “smallest unit of contribution” is. If a contribution can be made in a few minutes, many people might be willing / able to contribute to producing learning content. If a contribution requires a minimum of a few hours, far fewer people may be willing /able to contribute. “As I have elsewhere discussed in great detail, the size of the potential pool of contributors, and therefore the probability that the right person with the right skills, motivation, and time will be available for the job is inversely related to the granularity of the modules” (pp. 21 – 22). Schweik and English would later empirically demonstrate that this is true in the context of open source software.

If modularization and its effect on the availability of volunteers is one side of the problem, the other is the leadership, administration, and integration necessary to bring a very large collection of very small pieces together into a useful whole. “Integrating and smoothing out the text, style, and coherent structure of a chapter from contributions in much smaller tasks becomes much harder. The result of making the modules more fine grained may be to make the integrated whole too difficult to render coherent” (p. 20).

He summarizes: “The larger the granules the more is required of each contributor, the smaller the set of agents who will be willing and able to take a crack at the work. On the other hand, the granularity is determined by the cost of integration—you cannot use modules that are so fine that the cost of integrating them is higher than the value of including the module. The case of textbooks seems to be, at present, precisely at the stage where the minimal granularity of the modules in some projects—like FHSST—is too large to capture the number of contributions necessary to make the project move along quickly and gain momentum, whereas the cost of integration in others, like WikiBooks, is so high that most of the projects languish with a module here, and module there, and no integration” (pp. 21-22).

(The one place where I would push back on Yochai’s analysis is in what he sees as the difference between the adopters of K-12 textbooks and college textbooks. While K-12 textbooks need high degrees of coherence and must be created in accordance with existing standards before they can be adopted, he proposed that the same wasn’t true for higher education. It’s generally up to post-secondary teachers “to construct, integrate, and use the materials as fits their needs. No higher order organization is required, and none therefore represents a barrier to contribution” (p. 23). Meaning that you could simply have a community create lots of very small pieces without requiring any centralized integration service, because every faculty member will undertake that work on their own and do it in a way that meets their specific local needs. While this statement might reflect the reality of faculty who teach upper-level undergraduate and graduate courses, it’s also true that over half of US college and university courses are taught by adjunct faculty who need a complete, coherent learning resource that is ready to pick up and teach from on day one. So, for the foreseeable future, the problems of modularization and integration apply to higher ed as well as K-12.)

The problems associated with the need to modularize and the need to integrate are just as real now as they were back in the early 2000s. (And, for those of you who worked on learning objects, in the 1990s.) This means that “we’ve still not really cracked a community based production model for learning content” is likely a dramatic understatement of the problem. There’s a good argument to be made that a community based production model for learning content isn’t actually possible. Yes, it might be possible to set up a system where some people will contribute small pieces of learning content to a repository, but for the reasons described above those small pieces will never see adoption at scale due to problems relating to integration and coherence. And we should consider any production model that results in the creation of learning content that goes unused to be a failed model.

open content

Learning Engineering and Reese’s Cups

Reposting this message I sent to the Learning Analytics mailing list earlier this morning.

When I hear people say “learning engineering” I hear them talking about Reese’s cups.

I hear them talking about delicious chocolate (instructional design, or applied learning science or whatever you like to call it) and yummy peanut butter (learning analytics, or educational data mining, or whatever you like to call it). Chocolate and peanut butter are two things that, individually, taste great. And they taste even better together. In fact, they taste so much better together that people gave the combination its own name! They didn’t give this heaven-sent sweetie its own name in order to exercise dominance over either the chocolate or peanut butter industries. It was just really convenient to have a specific name to talk about this utterly fantastic combination of things. “I want a Reese’s cup!”

As I understand it, learning engineering is nothing more or less than a specific way of combining ID/ALS and LA/EDM techniques in order to engage in the iterative, data-driven continuous improvement of products designed to support learning:

  • You design something intended to support student learning (could be content, software, courseware, whatever).
  • You put it in the field and get students using it.
  • You measure its success at supporting student learning using a variety of analysis techniques.
  • You zero in on the parts that aren’t supporting student learning as successfully as you had hoped they would.
  • You re-design them.
  • You re-deploy them.
  • You re-analyze the degree to which they successfully support student learning.
  • You rinse and repeat.

That’s how I understand “learning engineering.” I could just as easily say, “the combination of specific instructional design and learning analytics techniques in support of iterative, data-driven continuous improvement.” Well, actually, no I couldn’t say that just as easily. 🙂

open content

S3: A Holistic Framework for Evaluating the Impact of Educational Innovations (Including OER)

This fall I’m once again teaching IPT 531: Introduction to Open Education at BYU (check it out – it’s designed so anyone can participate) and today I’m beginning a pilot run-through of the course redesign with a small number of students. I wanted to include a reading summarizing my current thinking on ‘evaluating the impact of OER’ in the course, so I’m letting some thoughts spill out below. This framework will be continuously improved over time.

In the past I’ve written frequently about how we evaluate the impact of OER use. These writings included ideas like the golden ratio (2009), the OER impact factor (2014), and thinking more broadly about impact (2018). Today I want to pull many of these thoughts together into a holistic, unified framework for measuring the impact of OER use (and other educational innovations). This new framework has three components: success, scale, and savings – hence the name S3. Below I’ll define each component, describe how to calculate it’s value, and describe how to aggregate the individual values into an overall score. I’ll then describe the applicability of the framework to evaluating educational innovations beyond OER.

Defining and Measuring Success

Student success is the first and most important component of the framework. In the S3 framework, “success” means “completing a course with a final grade that allows the course to count toward graduation.”

Research studies evaluating the impact of OER frequently report whether or not the grades received by students changed after their faculty began using OER. In the majority of cases the answer to this question is “no,” for reasons described by Grimaldi and his colleagues at OpenStax. In those cases where changes in final grade have been reported (by myself and others), they have often been reported as change in final grade percentage (86% versus 84%) or in GPA units (2.4 versus 2.6). While these changes are occasionally shown to be statistically significant, it is difficult to interpret their practical significance – because in almost every case (that I’m aware of) students who received the “higher” final grade actually appear to have received the same letter grade. In other words, a typical finding among the few papers that show a grade benefit associated with OER use is that while control students earned Cs on average, treatment students earned slightly higher Cs on average. This is what I mean when I say it’s difficult to determine the practical impact of such an “improvement” – GPAs likely don’t change in this scenario, meaning that students won’t qualify for scholarships at higher rates, improve their chances of getting into graduate school, or notice any other practical benefit.

The place where impact occurs most clearly is around the boundary between C and D final grades. Here a small change in final grade makes the difference between having to retake the course (i.e., paying tuition again, delaying graduation by another semester, possibly paying for the course materials again, etc.) or being able to count the course toward graduation. While the practical impact of the difference between a C and a slightly higher C is difficult to interpret, the difference between a D and a C is as wide as the ocean. Some research in OER has already begun using final grades of “C or better” (or the inverse, the DFW rate) as the measure of interest (e.g. Hilton, et al.) and more OER impact research should follow that lead.

In the S3 framework, success is measured using the C or better rate from before OER were used (an average of multiple previous terms is a more stable measure than the single prior term) and the C or better rate after OER began being used (again, an average of multiple terms is a more desirable measure than a single term) as follows:

The maximum value for success is 1, and this occurs only when the C or better rate is 1 for OER users. This encodes both the idea and the goal that each and every student should succeed in the course. The value is undefined when the C or better rate is 1 for control students, as it isn’t possible to improve success in this case. You can explore the full range of values here.

Defining and Measuring Scale

Scale is the next most important component in the S3 framework. Scale means “the proportion of students being reached.” If work that is highly impactful in a single setting cannot be scaled to multiple classrooms, it does little to advance the cause of improving success for each and every student. For example, supplementing a classroom instructor with an instructional designer, a researcher, and a pair of graduate students may allow incredible things to happen in that classroom, dramatically increasing the success of the students in the class. However, if there’s no practical way to adapt this model to other classrooms, we can’t use this model to help improve success for all students.

In the S3 framework, scale is measured using the number of students in sections of courses using OER (e.g., the number of students in sections of Intro to Psychology using OER) and the total number of students in all sections of those same courses (e.g., the total number of students in all sections of Intro to Psychology), as follows:


The maximum value for scale is also 1, and this occurs only when all relevant course sections are using OER. This encodes both the idea and the goal that each and every student should be included. Scale can include a single course (e.g., Intro to Psychology) or multiple courses (e.g., all general education courses), depending on where OER is being used at an institution.

Defining and Measuring Savings

Savings is the final component in the S3 framework. Savings means, “the amount of money spent on course materials by the average student using OER compared to the amount of money spent on course materials by the average control student.” When calculated accurately, the savings measure takes into account several factors:

  • The materials assigned to control students are available at many price points
  • Some control students don’t spend any money on course materials
  • Some OER students spend money on printed copies of OER (or on printing OER)
  • Some OER students spend money on courseware or homework systems
  • Printed copies of OER frequently cost as much or more than courseware or homework systems (e.g., see the prices of printed OpenStax books on Amazon)

Contrary to popular belief, savings is very difficult to measure accurately and some estimation and guesswork is almost always involved.

In the S3 framework, savings is measured using the average amount of money spent by OER users and the average amount of money spent by control students, as follows:


The maximum value for savings is also 1, and this occurs only when no student who was assigned OER spends any money on course materials. This value is undefined when no students in the control group spend any money on course materials, as it isn’t possible for OER users to save money in this case.

Calculating an Overall Score

In aggregating the individual scores into an overall score, we must consider the amount each individual component will contribute to the overall score.

I’ve argued above that success is the most important component of the three in this framework. As I wrote about at length in Taking Our Eye Off the Ball, I believe it is a huge mistake for us to look at 30% graduation rates from US community colleges and say, “the most important thing we can do is make that abysmal outcome less expensive.” (Likewise for the 60% graduation rate from US universities.) Making a 70% failure rate (or a 40% failure rate) more affordable is not the most important work we can do. We have to begin making meaningful progress on student success, and it should be weighted most heavily of the three components in the framework.

I’ve argued above that scale is the next most important component of the framework. Affordability (or savings) is one of the characteristics of a scalable innovation, as students can’t benefit from something they can’t afford. But it takes a lot more to make an educational innovation scale successfully than simply being affordable (or even free) – like how attractive it is to faculty or how easy or hard it is to implement successfully. Inasmuch as scale includes much savings and a range of other factors, it should be weighted more heavily than savings alone.

That said, I think it is important to continue to include savings as a standalone measure, if for no other reason than the historical importance of cost savings in the research on the impact of OER. But more importantly, including savings as a standalone measure of impact helps us remember the problems and mistakes of the past (and present) with regards to the pricing of course materials. Hopefully, keeping these errors front and center will decrease the chances that new innovations will travel down that path again.

As I have pondered relative weights for the components, it seems that success is at least twice as important as scale, and that scale is at least twice as important as savings. If we use:

$$impact\;=\;4\times success\;+\;2\times scale+1\times savings$$

We get a measure of impact with a maximum value of 7. I like that.

Extending the Framework to Educational Innovations Beyond OER

Viewed from a high level, OER is just one of hundreds of innovations whose proponents say has the potential to improve education. To the degree that is true, there is no reason to develop an OER-specific measure of impact. Indeed, it would be eminently useful to have a common measure of impact we could use to compare a wide range of would-be innovations.

There is nothing OER-specific about S3. You could use it to measure the impact of “things” like learning analytics or iPads or augmented reality. You could use it to measure the impact of “approaches” like problem-based pedagogies or collaborative problem solving pedagogies or active learning pedagogies. The questions S3 keeps us focused on are:

  1. How much does this innovation improve student success?
  2. How many of our students are benefiting from this innovation?
  3. How much money does this innovation save students?

These are all good questions to ask.