Waves of Annotation
Two weeks after the announcement (and some big waves by Facebook), there still seems to be serious interest in Twitter annotations. And yet, we have to wonder how much annotations will catch on, especially since it will be built from loosely coordinated actions among a large number of players.
I am admittedly no futurist, but I’ll take a stab at prognostication. For starters, I’d avoid waxing rhapsodically about our glorious semantic-web future. I would say it’ll be at least six months or more until we see widespread client usage of twitter annotations, and it may be that only a small fraction of tweets will be annotated even in the next few years. Progress may be fitful and sporadic, driven in the early days by some major adopters. Most users will remain unaware of the functionality or opt out.
How can I guess the future? By looking at the past. Twitter has already introduced a level of annotation to tweets: geolocation. And they didn’t exactly take off like wildfire. Five months after its launch, geotagged tweets are still a rarity in the general stream (a few hours of the Twitter Sampling API revealed only about 0.85% of tweets collected had locations). Of course, annotations are not exactly the same as geolocation: users may be more interested in sharing what books they’re talking about than where they are; but they suggest a general model for adoption that will probably remain true. I’m guessing we’ll see the following waves of annotation:
Wave 0: Early Hacking When Twitter Annotations launch, expect to see an initial burst of one-off hacks where developers put it through the motions and play with possibilities both silly and sublime. Annotations will probably roll out with a developer wiki/emailing list that means that some standards will coalesce pretty quickly, although some may fall out of favor in later stages.
Wave 1: Automated Tweets (1-3 months later) The next major wave of annotated tweets will come when various websites that automatically tweet add the metadata they have to their messages. For instance, we’ll probably see the following annotations pretty quickly off the bat:
- MediaRSS or similar extensions for describing photographs on photo-posting services like TwitPic or YFrog
- Product information alongside “Tweet This Product” links from retailers like Amazon. This will probably include UPC or ISBN as well as links to a purchasing page.
- Book-specific metadata from sites like Goodreads or Readernaut where users rate books they have just read. Many of thess sites will likely use Facebook’s Open Graph Protocol, since they will already want Facebook integration as well.
- “Like” actions on various websites. Here we’ll probably see a tussle between the Open Social Graph and Activity Streams standards.
- News-specific metadata from news feeds. This might include generalized concepts like urgency or the byline or publisher-specific metadata. Linked Data would be helpful to connect stories across various taxonomies.
Automated services may very well remain the predominant source of annotated tweets. For instance, remember that figure about geotagged tweets earlier? Foursquare checkins accounted for 32% of them, and I imagine it would be higher if I ran my sample at night. More on that below.
Wave 2: General Standards Emerge (2-4 months) In the beginning, everybody will probably annotate alone: Amazon will probably use an amazon namespace, the New York Times will use nyt, and every twitter photo-sharing site will probably use the media namespace in widely different ways. In other words, everybody tweets alone, using their own terminology and taxonomies to organize it. So, searches for things like tweets about books or Barack Obama returning only a fraction of annotated tweets. Ultimately, certain conventions will predominate (whether adapted from existing standards or created entirely new), precisely because it’s easier to build tools against a few standards than many.
Wave 3: Client Support (6-9 months) Eventually, some API clients will add support for annotation within the client. I doubt clients or the twitter website will just add direct fields for entering annotation triples; that would be a user-experience disaster. Instead, we will probably see the following features rolled out once the automated tweeting services have established some standards:
- Display of extended URL information contained within tweets.
- Displaying thumbnails or other metadata from annotated picture tweets.
- ISBN and UPC lookup against Amazon or other retailers (with purchasing links, naturally)
- Allowing the user to explicitly annotate tweets about books or movies via a pop-up wizard.
- If Facebook adds Open Social Graph annotation for user statuses, then many clients that support posting to both Twitter and Facebook will likely add annotations using the Open Social Graph namespace.
I would guess though that automated services will always be the predominant producers of annotated tweets, and clients will mostly focus on displaying annotations. Simply because the user experience for third-party services will almost always be more fun than the base information sharing of a twitter client. This is why Foursquare accounts for a hefty chunk of geotagged tweets. Similarly, a user on GoodReads gets cumulative summaries, search, and other features beyond what would be available in most twitter clients. Ultimately though, the reason why there won’t be many people manually annotating their tweets is because it runs counter to the reason we joined twitter in the first place: unlike a blog, it’s just one simple text entry box, and any additional input (no matter how seamless) will most likely never get used. And so, from here on out, we’ll probably see decreasing adoption rates for annotations on twitter.
Wave 4: Automated Entity Extraction and Aggregation (9-12 months). So, it’s unlikely that many users will manually annotate their tweets in clients (although they may be willing to use third-party sites that will). But what if annotation were mostly automatic? Consider perhaps semantic analysis in the client that watches your typing and presents suggested annotations to be reviewed and deleted below the box the same way Twitter.com suggests your neighborhood. This might increase adoption to some degree, but I remain skeptical. More likely we’ll see a larger uptick from the man retweeting bots on twitter when they add semantic analysis and annotations to their bags of tricks. So, the @nytwriters account might retweet and add the annotation og:organization='The New York Times' to tweets.
Final Thoughts So, what’s the ultimate takeaway from this lengthy piece? Mainly, annotations will probably be adopted in distinct waves, with the largest bump in the middle resulting from automated services. And, despite some of the hopes to the contrary, annotations will probably be used on a only a very small fraction of tweets, of which the bulk will be from automated services like TwitPic, YouTube, or Foursquare. I am not saying annotations will be useless; on the contrary, nothing is more powerful than giving sites with metadata the tools to share them. But like most other forms of media, freeform text (with all of its ambiguities) will always remain predominant in twitter; for instance, if your tools just focus on retrieving tweets annotated movie:title="Avatar", you’ll indeed avoid all those tweets about people’s twitter avatars, but you’ll also miss almost all of the conversation about the movie (the precision/recall curve strikes again).
But, I could be wrong. Thoughts?
No trackbacks yet.