Gravity, Bandwidth, and Tokens: Fundamental Constraints on Design

Back in the mid-1990s I read an absolutely amazing article that had a lasting impact on my thinking. Despite looking for it several times over the years, I’ve never been able to find it again. This was the era of 14.4k, 28.8k, and 56k modems, when we used our home landlines to dial up and connect to the internet. The article’s main argument was that, just as architects have to understand and account for gravity in their designs of bridges and buildings, web architects have to understand and account for bandwidth in their website designs. Back in the day, including too many large images on a webpage could “weigh it down” to the point of “collapse.” Your 28.8k connection provided so little bandwidth to your home that you simply couldn’t download that much data in a reasonable amount of time, so after waiting a minute for the page to load you just gave up and went somewhere else.

I was fascinated by the metaphor “data has weight,” and excited by the creative work of making designs that both minimized that weight and distributed it effectively across information architectures (I was running a web design and internet services startup in the mid-1990s). If data has weight, then bandwidth is a fundamental constraint on web designs in the same way that gravity constrains the designs of buildings and bridges.

Recently I’ve been wondering ‘what is the equivalent, fundamental constraint on our designs in today’s era of generative AI?’ The answer is: tokens.

“Tokens have weight,” but they don’t weigh down the user experience by making things slow. They can weigh down the user experience by making services so expensive that would-be users can’t afford to use them. For example, OpenAI’s ChatGPT Pro subscription, which includes “o1 pro mode, a version of o1 that uses more compute to think harder [i.e., generate more tokens] and provide even better answers to the hardest problems,” costs $200 per month. And beyond “reasoning” models from big providers, poorly designed AI applications from startups can waste tokens, driving up prices for users and/or causing them to hit usage caps sooner.

In the early days of the internet there was a lot of interest in novel image compression techniques as a way to decrease the “weight” of data on webpages. We’re just beginning to explore these kinds optimizations for LLMs, with prompt caching being the good example. At the same time we were learning to use compression to make images weigh less, research and development in core internet infrastructure were effectively “decreasing gravity” by making broadband connections faster and more affordable. There is a huge financial incentive for the OpenAIs and Googles of the world to make these same kinds of advances for generative AI infrastructure in order to decrease their costs. Early examples include custom chips that generate more tokens, faster, using less power, like Groq’s Language Processing Unit.

Given how quickly things are advancing with generative AI, we should all be “skating to where the puck is going to be” in our thinking about how to use these tools to support learning. At the same time, there are students in classrooms today, trying to learn and grow and develop before the puck gets to wherever it’s going. Being thoughtful about tokens today, the same we had to be careful about bandwidth 30 years ago, will serve us all well.