The LHC and Education

I’ve always been impressed by the idea of the Large Hadron Collider. It’s an unthinkably expensive, large-scale experimental apparatus designed for the sole purpose of generating and collecting data. Why would countries spend so much money on data? Why would so many people dedicate the better part of their lives to a project like the LHC? Because the so-called “hard” sciences – fields like physics and astronomy – have made the remarkable progress they have in understanding the structure of matter and the nature of the universe because they really care about data. They care about data in a way that educators have a difficult time comprehending, let alone understanding.

The data that we, educators, gather and utilize is all but garbage. What passes for data for practicing educators? An aggregate score in a column in a gradebook. A massive, course-grained rolling up of dozens or hundreds of items into a single, collapsed, almost meaningless score. “Test 2: 87.” What teacher maintains item-level data for the exams they give? What teacher keeps this data semester to semester, year-to year? What teacher ever goes back and reviews this historical data? After a recent tweet on this topic, a number of colleagues accused me of having physics envy. Believe me, you don’t have to wish you were a physicist to be disappointed by the quality of data educators have access to.

I’m beginning to believe that we’ve got it completely backwards. For decades we’ve been trying to use technology to improve the effectiveness of education. How, specifically, have we tried to use technology? At a high level, we’ve tried to use it to deliver content to learners. The goal has been to “find something that works,” and then deliver that something (interactive content, etc.) to learners at high fidelity and low cost. In our attempts to deliver effective content at scale, I believe we have had a nationwide (if not worldwide) encounter with the reusability paradox, which I first wrote about at length in 2001. Briefly stated, the reusability paradox says that, due to context effects, the pedagogical effectiveness of content and its potential for reuse are orthogonal to another. This finding is too inconvenient to accept, as it would destroy or severely maim the prominent paradigm of educational technology research, and so it has been roundly ignored by the educational research community.

While using technology to deliver content seems to have had no noticeable impact (or even a slightly negative) on the effectiveness of education, using technology to deliver content has had a huge impact on the accessibility of education. Think of distance learning… Think of opencourseware and open educational resources… Think of the millions of people who now have access that never would have had access otherwise. The impact of using technology to deliver content on increasing access to education is completely unassailable and totally undeniable.

So, if using technology to deliver content is not improving the effectiveness of education, is there another way we might use technology that can? I believe there is. I believe it so strongly that for the first time in several years I am opening a new line of research. I believe (and I fully admit that it is only a belief at this point) that using technology to capture, manage, and visualize educational data in support of teacher decision making has the potential to vastly improve the effectiveness of education. Think of it as “educational data mining” or “educational analytics.” For example, think of all the data, algorithms, and resources that go into selecting ads to show in search engine results and other places around the web, and then think of using all that horsepower to make suggestions to teachers about appropriate opportunities to intervene with students.

The Open High School of Utah is the first context in which I’m studying this use of technology. Because it is an online high school, every interaction students have with content (the order in which they view resources, the time they spend viewing them, the things they skip, etc.) and every interaction they have with assessments (the time they spend answering them, their success in answering them, etc.) can all be captured and leveraged to support teachers. The OHSU teaching model, which we call “strategic tutoring,” involves using these data to prioritize which students need the most help and enabling brief tutoring sessions. A teacher’s typical day involves visiting the dashboard, viewing the first student in a prioritized list of students, seeing what s/he needs help on, and engaging him/her by Skype, phone, IM, or other means, for a very brief, very targeted individual tutoring session. Then the next student, then the next student, etc. Students who are on track or working ahead in the online curriculum don’t have to wait for an interaction with the teacher (they’re succeeding, after all), and those who need help get it – individualized, just in time, and sometimes before they even know they need it. From a caring human being – not a supposedly intelligent tutoring system.

Now, if the OHSU wasn’t delivering content online we couldn’t capture all this data. So in one sense, it’s key to deliver content online – if only to get the types of data we need to support teachers supporting students. But currently, we’re stopping short, confusing the means for the end.

Another realization that comes part way down this path is that our instructional design programs may teach people how to design instruction that is motivating and engaging, but we don’t even begin to teach people how to design materials and systems that capture the right kinds of data. We don’t even discuss what the “right” kinds of data might be.

Coming back to the LHC, I think meaningful progress in education will depend on educators becoming infected with a passion for data like the LHC embodies. Not rolled up percentile scores, coarse-grained data that obscure all the meaningful details we might care about. We need access to real-time data on every individual student every day of the year, we need tools and techniques for supporting teachers in interpreting the data, we need new teaching models that leverage the existence of these data and tools, etc. This is what I think technology-enhanced education is supposed to be.

The investment it would take to deploy such an infrastructure would rival the cost of the LHC, but would be almost impossible to make – because educators either don’t care about data or have a vision of data that is limited by their own experience recording things in a gradebook or spreadsheet. Using technology in creative ways could provide us with so much more data it would boggle the imagination… It could transform the teacher’s work from one based on hunches and intuitions to one actually based on data. And low and behold, we might actually move the needle a bit when we combine the best of hardcore empiricism with the best of caring, nurturing people.

We’ll certainly never meet Bloom’s 2 sigma challenge if we think the proper role of technology in education is simply delivering content (whether interactive, intelligent, or otherwise). However, if we get serious about capturing and using data to support teacher decision-making and improve student learning, we may have something.

Comments on this entry are closed.

  • Really interesting post. I like the idea but the LHC and similar data intensive projects are being carried out by highly focussed researchers dedicated to one project albeit one with multiple outcomes. I do wonder whether, at university level, we will ever be able to achieve the level of sophistication you are talking about with relation to ‘educational analytics’ as many do not have any educational training behind them and have other ‘priorities’. It may well be easier to implement at high school though. I will think about it some more.

  • This reminds me of what Candace Thille and her team is doing with CMU OpenLearningInitiative – that is quite a unique OER project, in that they gather data on every interaction, designed it specifically to gather the kinds of data they need, and use it continously to experiment, subtly tweaking presentation or order of content etc.

    I guess one thing that would have to happen in terms of OER/distance learning is to have systems more like OLI, with lot’s of interaction, and smaller modules, rather than just big PDFs – where all you can tell is “was downloaded X number of times”, you don’t know if people spent 5 hours reading it, printed it out, never looked at it again, etc.

    But the issue with OER and data gathering is of course also tricky – CMU doesn’t let anyone else download and install OLI, partly because they want to keep all the data themselves…

    I’m thinking about what kind of data we should be tracking at the Peer2Peer University, and how we could use it more proactively.


  • I’m also a huge fan of what Candace and the OLI team are doing. However, their model is designed to support independent study without the interaction of human beings like faculty and tutors. I’m extremely skeptical of this model over the long term. What I’m after with this new line of work is achieving the proper balance in which we let the machines do what they do well, and reserve to people that which they do better than machines.

  • Hi David-
    “The data that we, educators, gather and utilize is all but garbage.” That’s got to be one of the best single lines I’ve read in a long time.

    Here’s a quote from Stephen Jay Gould that relates to your post. It’s from his 1977 book Ever Since Darwin.
    “… ecology is the study of organic diversity. It focuses on the interaction of organisms and their environments in order to address what may be the most fundamental question in evolutionary biology: ‘Why are there so many kinds of living things?’ … During the first century of Darwinism, ecologists pursued this question with little success. In the face of life’s overwhelming complexity, they chose the empirical route and amassed storehouses of data on simple systems in limited areas. Now, nearly twenty years after the centennial of Darwin’s Origin of Species, this poor sister among evolutionary disciplines has become a leader. Spurred by the efforts of scientists with a mathematical bent, ecologists have built theoretical models of organic interaction and appplied them successfully to explain data from the field. We are finally beginning to understand (and quantify) the causes of organic diversity.”

    Ecologists apparently devoted 120 years to gathering descriptive data before it became really useful. This seems like a normal trajectory for scientific domains. I suspect some educators would take exception to your characterization of their research results as garbage. But your point is well taken. We’re still in the early decades of education’s 120 year trek.

    Nice post. Very fun.


  • Pingback: On the Quality of Education Data «()

  • Pingback: ¿Walden o Walden dos? Cuando la identidad digital nos devore… | El caparazon()

  • Interesting post and comments. I like the OP concept of “educational analytics”. In a blog post I wrote a while back, I noodled around with the idea of co-opting or exapting some of the metrics and analytics that direct-email marketers use, transforming them for use in education. (I did this in conjunction with some ideas around learning community design.)

    Co-opting metrics from related fields is a way to quickly bootstrap this kind of research, addressing the concerns of @Mark Smithers and @Stian Haklev about the relative paucity of useful existing metrics in education. For example, email marketers talk about email “opens”, “click throughs” and “conversions”. These three concepts apply when a marketing email message contains some kind of “call to action”. Did people look at the message, did they click through to some landing page, did they do the thing the landing page prompted them to do? These questions can also be asked of educational resources.

    The thing is, on the back of relatively simple metrics like this, some fairly sophisticated analytics can be applied, to derive information about the relative effectiveness of different messages and campaigns. I am not a numbers wonk, but I sense that it would be useful to try to bastardize these kinds of analytics for educational contexts. Even the failure of this effort would tell us something about what it is we need to measure in terms of “educational analytics”.

    Great post!

  • Peter

    Perhaps you shoudl talk to a a special educator?
    As a special education teacher I have been part of a large body of educators who, as required by federal law and state guidelines, have been providing Individual Education Plans specifically designed to meet the educational needs of a student for many years. This has been done in a data driven decision based model for some 30 years.

    Now this model of educational programming does not have as specific data as you will have- how long on a website and how many clicks. But it does take in to consideration the specific skills exhibited by the student. It also requries the teacher to adapt their pedagogy and (perhaps) curriculum to enable the student to succeed.

    However this is a difficult pedagogical process. It requires a lot of effort and skill on the part of the teacher. It also takes time. And, even with good data, it is not always possible to make the educational plan work well for the student. And may not be supportable in higher education.

    There is a considerable body of knowledge in this are that you might consider looking at.

  • Pingback: 2¢ Worth » Is Education Really about Data?()

  • May I recommend that you go and read Seymour Papert’s papers on using technology for learning. Its not about delivering content, and learning is not about acquiring content. Technology gives us tools for inquiring about the world, and inquiring about the world is learning.

  • Pingback: The indicators project and what it means for me « The Weblog of (a) David Jones()

  • Pingback: Participation, impact, collecting data and connecting people « The Weblog of (a) David Jones()

  • Cecilia d’Oliveira

    David, there’s an MIT physics professor named David Pritchard who has been using student performance data to understand what helps MIT students learn physics. His research group is called RELATE (research in learning, assessing, and tutoring effectively – see and their publications are listed here I’d be interested in your feedback on their work.