The Appeal of Annotations
Last week, I had the honor of speaking at Twitter’s Chirp conference, where I talked a little about some of the New York Times’ upcoming integration of @anywhere, and share some fun statistics I had unearthed along the way. For instance, someone tweets a link to a New York Times story once every 4 seconds. Like most developers there, I will admit to some mixed feelings about the conference beforehand, but I was pleasantly surprised about some of the features on the twitter roadmap (and I enjoyed meeting some of the developers there face-to-face). The most exciting feature? Annotations.
This upcoming feature allows programs to submit up to 2K worth of annotations with a tweet. Annotations themselves are triples of a namespace, key, and value, and there are relatively few restrictions beyond that. Presumably multiple values for a given key are allowed, although it’s unclear yet how they will be represented in the format. Beyond that, twitter is stepping back, preferring to see what standards and conventions emerge from the developer community rather than dictating the usage of annotations.
It’s a smart decision, but not one without drawbacks; too much passivity could lead to early fragmentation and confusion: for instance, we’ll probably see a proliferation of various annotation formats for the url, info, and possibly even media namespaces. Such confusion might delay the adoption of annotations by twitter clients, since inconsistent annotations are harder to support than none at all. Of course, it is also possible that the most prominent clients may carve out their own specific functionality via privately documented annotations. And so, there might be a tweetdeck or seesmic namespace before there are general ones. And so, while I think Twitter, Inc. should indeed stay away from specifying annotations, I think it would be very helpful if they provided an official space (perhaps a wiki on dev.twitter.com for developers to document and discuss some possible uses for annotations. And it would be great if a few suggestions were already there for developers looking to get started before annotations. So, what kind of annotations might make sense? Here are some thoughts of mine:
- Url Information I don’t think annotations will eliminate tiny URLs from tweets (at least not in the next few years), but I do think it makes sense for annotations to contain expanded or canonical URLs for the tiny URLs mentioned in the body. This would allow researchers to analyze URLs years after the shortener vanishes (although it could also create an opportunity for phishing where the tinyURL does NOT go to the canonical destination provided).
- Media – the MediaRSS standard seems like a natural fit for services like Twitpic and YFrog, allowing tweets to specify thumbnail URLs, dimensions, etc.
- Product information – ReadWriteWeb wrote an extensive piece about annotations where they suggested that existing product identifiers like ISBNs or other such things would make for excellent annotation material. I agree.
- Urgency – some sort of mechanism for indicating that some tweets are breaking news or urgent in some other way has been suggested by a few developers I spoke with. Of course, this could be abused by that one guy you know who always sends all his emails with maximum priority.
- Meta – A namespace for specifying information about the annotations in the document. It’s unclear if we would one one place for this, or a meta key in each namespace.
- CSS – Finally, with the font-family key, you CAN tweet in Comic Sans. This is me being silly, but one could theoretically see some artsy use cases for tweet-specific CSS styling specified via annotations.
- Linked Data – The most intriguing potential use case of annotations is Linked Data. Most tweets from the @nytimes twitter feed are stories with internal cataloging information that says what the story is about (the people, the places, organizations, etc.). This information is useful in itself, but when it is linked to an outside taxonomy like DBPedia, it becomes compatible with global taxonomies. Meaning, you can search for tweets tagged ‘Twitter (Organization)’ and find stories tagged with any of the linked taxonomies.
These are just a few ideas off the top of my head, and I’m looking forward to what proposals other developers might make. Of course, the real killer use of annotation is not in displaying tweets on the timeline but search and the streaming API. Imagine being able to retrieve tweets tagged with a certain annotation key, or key/value, or even with a certain namespace. Combine this with a generalized annotation scheme like Linked Data and it’s suddenly possible to search for all tweets with images or links to books or stories about fine dining. Or so the hopes go. Reality will likely be messier.
For starters, adoption of annotations will probably be halting and inconsistent (that’s the subject of another post), largely dependent on the abilities of twitter client developers to make it happen. Text search is not going away anytime soon. And annotations will not eliminate confusion in themselves. For instance, it seems to me that the majority of annotation use cases will be to describe what the tweet is linking to, rather than the contents of the tweet itself. Similarly, I imagine we will see usage of automated annotation in some clients where the word Paris in the text leads to the tweet being annotated as being about Paris the City, when it’s really not at all. Of course, this is all just quibbling that would make a research librarian proud, but it matters to some.
Still, let’s not lose our enthusiasm for annotations. This could be big, and I’m excited to see where it will wind up. Let’s get out there and start coding.
Update: And then just like that, Facebook announced the Open Graph Protocol which among other things, specifies an annotation namespace that can include things like book information, people mentioned, etc. It’s an attractive standard defined for Facebook’s “Like” mechanism that will probably be quickly adopted within Twitter annotations as well. Unfortunately, with the exception of UPCs and ISBNs, all Open Graph fields are arbitrary text. This means there will still be a lot of ambiguity within the annotations compared to Linked Data (for instance, people might tag “Barack Obama”, “Obama”, “Barack Hussein Obama”, “President Obama” all for tweets about Obama), but it also means there might be greater usage.