The Appeal of Annotations

Last week, I had the honor of speaking at Twitter’s Chirp conference, where I talked a little about some of the New York Times’ upcoming integration of @anywhere, and share some fun statistics I had unearthed along the way. For instance, someone tweets a link to a New York Times story once every 4 seconds. Like most developers there, I will admit to some mixed feelings about the conference beforehand, but I was pleasantly surprised about some of the features on the twitter roadmap (and I enjoyed meeting some of the developers there face-to-face). The most exciting feature? Annotations.

This upcoming feature allows programs to submit up to 2K worth of annotations with a tweet. Annotations themselves are triples of a namespace, key, and value, and there are relatively few restrictions beyond that. Presumably multiple values for a given key are allowed, although it’s unclear yet how they will be represented in the format. Beyond that, twitter is stepping back, preferring to see what standards and conventions emerge from the developer community rather than dictating the usage of annotations.

It’s a smart decision, but not one without drawbacks; too much passivity could lead to early fragmentation and confusion: for instance, we’ll probably see a proliferation of various annotation formats for the url, info, and possibly even media namespaces. Such confusion might delay the adoption of annotations by twitter clients, since inconsistent annotations are harder to support than none at all. Of course, it is also possible that the most prominent clients may carve out their own specific functionality via privately documented annotations. And so, there might be a tweetdeck or seesmic namespace before there are general ones. And so, while I think Twitter, Inc. should indeed stay away from specifying annotations, I think it would be very helpful if they provided an official space (perhaps a wiki on for developers to document and discuss some possible uses for annotations. And it would be great if a few suggestions were already there for developers looking to get started before annotations. So, what kind of annotations might make sense? Here are some thoughts of mine:

  • Url Information I don’t think annotations will eliminate tiny URLs from tweets (at least not in the next few years), but I do think it makes sense for annotations to contain expanded or canonical URLs for the tiny URLs mentioned in the body. This would allow researchers to analyze URLs years after the shortener vanishes (although it could also create an opportunity for phishing where the tinyURL does NOT go to the canonical destination provided).
  • Media – the MediaRSS standard seems like a natural fit for services like Twitpic and YFrog, allowing tweets to specify thumbnail URLs, dimensions, etc.
  • Product informationReadWriteWeb wrote an extensive piece about annotations where they suggested that existing product identifiers like ISBNs or other such things would make for excellent annotation material. I agree.
  • Urgency – some sort of mechanism for indicating that some tweets are breaking news or urgent in some other way has been suggested by a few developers I spoke with. Of course, this could be abused by that one guy you know who always sends all his emails with maximum priority.
  • Meta – A namespace for specifying information about the annotations in the document. It’s unclear if we would one one place for this, or a meta key in each namespace.
  • CSS – Finally, with the font-family key, you CAN tweet in Comic Sans. This is me being silly, but one could theoretically see some artsy use cases for tweet-specific CSS styling specified via annotations.
  • Linked Data – The most intriguing potential use case of annotations is Linked Data. Most tweets from the @nytimes twitter feed are stories with internal cataloging information that says what the story is about (the people, the places, organizations, etc.). This information is useful in itself, but when it is linked to an outside taxonomy like DBPedia, it becomes compatible with global taxonomies. Meaning, you can search for tweets tagged ‘Twitter (Organization)’ and find stories tagged with any of the linked taxonomies.

These are just a few ideas off the top of my head, and I’m looking forward to what proposals other developers might make. Of course, the real killer use of annotation is not in displaying tweets on the timeline but search and the streaming API. Imagine being able to retrieve tweets tagged with a certain annotation key, or key/value, or even with a certain namespace. Combine this with a generalized annotation scheme like Linked Data and it’s suddenly possible to search for all tweets with images or links to books or stories about fine dining. Or so the hopes go. Reality will likely be messier.

For starters, adoption of annotations will probably be halting and inconsistent (that’s the subject of another post), largely dependent on the abilities of twitter client developers to make it happen. Text search is not going away anytime soon. And annotations will not eliminate confusion in themselves. For instance, it seems to me that the majority of annotation use cases will be to describe what the tweet is linking to, rather than the contents of the tweet itself. Similarly, I imagine we will see usage of automated annotation in some clients where the word Paris in the text leads to the tweet being annotated as being about Paris the City, when it’s really not at all. Of course, this is all just quibbling that would make a research librarian proud, but it matters to some.

Still, let’s not lose our enthusiasm for annotations. This could be big, and I’m excited to see where it will wind up. Let’s get out there and start coding.

Update: And then just like that, Facebook announced the Open Graph Protocol which among other things, specifies an annotation namespace that can include things like book information, people mentioned, etc. It’s an attractive standard defined for Facebook’s “Like” mechanism that will probably be quickly adopted within Twitter annotations as well. Unfortunately, with the exception of UPCs and ISBNs, all Open Graph fields are arbitrary text. This means there will still be a lot of ambiguity within the annotations compared to Linked Data (for instance, people might tag “Barack Obama”, “Obama”, “Barack Hussein Obama”, “President Obama” all for tweets about Obama), but it also means there might be greater usage.

About these ads
  1. Jacob,

    I think you are spot on in your initial recommendations of what metadata Twitter annotations might be include.

    I agree that Linked Data / Semantic Web schema could be particularly useful. It could massively scale search effectiveness and distribution opportunities while allowing sophisticated on-the-fly analysis of Twitter’s firehose & other feeds.

    The Twitter ecosystem is particularly suited to applying principles of the Semantic Web in that machine interpolated meaning and “Twitter Resonance” could be continually refined the more humans tweet & retweet about the same and related topics (resolved by semantic entity detection; coupled with author, location, hashes, platform, temporal and other information.

    The Marshall Kirkpatrick RWW piece you referenced suggests that Twitter intends to leave the annotation classification system to be determined by the market.

    @ZacharyCohen today opined today, “that social media, in its purest form, is our generations’ destiny hammer.”

    I suspect the conversation on how Twitter annotation metadata standards might best serve potential services is just beginning. I hope to find your thoughtful voice joining others in this rare opportunity to help shape an attribute of a powerful social media tool.

  2. Nice post, Jacob. Along with using Annotations for urgency, we should also use them for corrections. My latest column for Columbia Journalism Review picks up this idea:

    Your post helped me, so thanks.

    • Kevin Mark
    • December 22nd, 2010

    The Twitter ecosystem is particularly suited to applying principles of the Semantic Web in that machine interpolated meaning and “Twitter Resonance” could be continually refined the more humans tweet & retweet about the same and related topics (resolved by semantic entity detection; coupled with author, location, hashes, platform, temporal and other information.wood pellets/wood chips

  1. April 26th, 2010
  2. January 14th, 2011

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: