TULIP: the Theoretical Upper Limit of Impact of Products

Today and tomorrow I’m at the EdTech Efficacy Research Academic Symposium in Washington, DC. The conversations here have been wonderful and have reminded me of something…

For many years, several friends and I have argued about the following question:

After accounting for all other differences - differences in a student’s age, race, gender, income, and prior academic success; differences in school environments; differences in teachers; differences in support available from friends, family, and other out-of-school sources; &c. - what is the theoretical upper limit on the impact a specific textbook, digital learning platform, or other edtech product can have on educational measures we care about (e.g., final grade, completion rate, time to graduation, satisfaction, etc.)?

If we don’t have a notion of the maximum potential impact these kinds of tools can have on measures we care about, how can we judge their effectiveness? For example, if the upper bound is +0.43 letter grades then we would interpret a product achieving a lift of +0.2 letter grades in one way. But if the the upper bound is actually +1.7 letter grades, we would interpret that same lift of +0.2 letter grades in an entirely different way.

While it’s interesting - and even useful - to compare the measures (like final grade) associated with different products, it feels like this work is ungrounded in a way that unsettles me.

I have some thoughts on the topic, but right now am just putting this out there and wondering what other people think…