I was recently asked what I thought of Cory Doctorow’s classic Metacrap paper. He was, of course, dead on when he wrote it. The nagging question at the time was “what alternatives do we have?” And the sad answer was “not many.”
Creating metadata, as traditionally conceived, is the ultimate form of unidirectional asynchronous communication (aka – a message in a bottle). The cataloger attempts to divine a number of individuals who, at some point in the future, are trying to locate a particular resource or one like it, and asks him/herself, “how could I best describe this resource so that these imaginary inhabitants of the future will be able to find it?”
Best practice has been to create controlled vocabularies, publish those so that future searchers will be able to find them, and mandate and guarantee consistency of application of these terms to resources by catalogers. The description of the controlled vocabulary becomes the secret decoder ring that an individual needs in order to find resources. Quite without intent, this practice doubles the searcher’s task for all but the most trivial searchers (like finding a book when you know the author’s name): now a searcher must first find the decoder ring; then, they must figure out how to use it to find the resource they’re looking for.
On very rare occasions, and I mean that seriously, technology comes riding to the rescue. Today we do have at least one viable alternative to the scenario Cory outlined what seems like an eternity ago. Delicious, Flickr, and other sites show the first step. First, instead of having people catalog resources they know about for others to find, they let people catalog things they know about so that *they* can find them again later. No imagining future searchers; no wondering what terms will be meaningful to them. Also, no incentive problems – why spend the time cataloging things for the possible benefit of imaginary people in the future, when you could catalog things for your own benefit?
Delicious and Flickr get a quarter of the way to the last step. When I tag a resource in delicious with “python” or “ruby”, the tag itself becomes a link to all the resources other people have tagged with the same term. That’s pretty useful. What would be even more useful, and we’re going to be showing this in May, is a collaborative filtering system or recommender system (think Amazon’s book recommendations) that matches you with other people like you based on your actual tagging behavior. You just go on bookmarking, blogging, and tagging for your own benefit, and at no extra cost to you the system will help you find other resources and, even more valuable, like-minded people.
In other words, one solution to the problem Cory outlined is the free and open sharing of catalog data created by an individual for their own use, and the automation of person to person recommendations of resources and people based on that data. Notice that folksonomies don’t solve the problem by themselves – free and open access to a large collection of folksonomic data is necessary. We’re calling the work we do in this area “folksemantic” because it blends folksonomic approaches with semantic web approaches (which strictly speaking shouldn’t work – which is why we love it).
I don’t think Cory’s piece was dead on when he wrote it, any more than it is now. It is in itself a series of strawmen, implying that people that believed who metadata to be useful insisted on it being 100% pure and accurate. Not so then, not so now. He generally paints metadata in a bad light, but the arguments…well, how about “In meta-utopia, the lab-coated guardians of epistemology sit down and rationally map out a hierarchy of ideas, something like this:” – he proceeds to display the flaws of such an approach. There are no “lab coated guardians”, or at least none that are mapping such hierarchies who are unaware of the problems raised. Straw. Man.
The problem was that a lot of people seemed to accept the arguments of the piece at face value, and personally I encountered increased resistance to the notion of metadata having value (in the context of RDF). It’s taken a while for systems like del.icio.us to appear that make really obvious counterpoints to Cory’s thesis, but it was bound to happen. Just seems like good ideas got pushback based on mythology.
An interesting way of reading the piece is to drop the “meta” part of “metadata” wherever it’s used. Pretty much all the points he makes on the lousiness of metadata apply equally well to data. Yet that stuff seems to be very useful. So I recommend dropping the “Meta” from the title to get a more accurate description of the content.
While I’m ranting….I don’t understand why you suggest folksonomic and Semantic Web approaches “strictly speaking shouldn’t work” together. Seems like a match made in heaven to me.
See also: http://www.holygoat.co.uk/projects/tags/
This sounds like something I have been telling people to watch for since tagging first hit the radar screen– tagging is the first step. Algorithmic clusters are a nice second step. But the real fruit will come when the linkages, filters, and predictors go multiple layers deep– not just on connections, but on behaviors. If people don’t see this it’s no surprise that they don’t see how fantastically able such systems can be in the future and feel that tagging will eventually just lead to a big, fractious mess.