The Long And Short Of It
Recently, Jay Rosen pointed that Bitly analytics seemed to report many more hits than other URL shorteners. The problem looked eerily familiar; and then I realized it was the same issue I had observed a few months back and had even half-written a blog post about it that never got posted (if you read my blog, this should come as no surprise). In the interest of explaining things thoroughly, here finally is a post on the matter.
Bitly has become the preeminent URL shortening service, largely due to the fact that it provides easy access to analytics on usage (you merely need to append a plus-sign to the shortened URL). This is the reason I converted all of the automated New York Times twitter accounts over to bitly 3-4 months ago, in the hopes of sharing the analytics of twitter user behavior with the world. However, when I collected a month worth of link usage statistics from Bitly, I noticed that the quoted total seemed unusually high, a 3-5x multiplication of hits recorded by our page analytics software. At first I thought it might be a result of comparing apples to oranges: the bitly counts might include other users shortening the same URL, and internal analytics is undercounting twitter traffic since very few twitter users come to the site via twitter.com.
Fortunately, though I have a way to check apples directly against apples. When my automated script posts a URL to twitter, it appends a query string to the URL before shortening to indicate that the click is coming from a twitter client (and also which account). So, if you look at the URL for this movie review of District 9 posted to the nytimes twitter account last week you’ll notice it has some additional arguments on the end:
you can see it has ?src=twt&twt=nytimes appended to the end of it. This allows me to tag all hits from my automated accounts and compare directly the numbers from bitly (because nobody else is shortening that specific URL) to the numbers reported from internal analytics (because those hits are coming solely from the automated accounts). The really savvy among might notice I can also distinguish traffic from different accounts on twitter. But this is where the discrepancy became apparent. Here are the counts for that URL in bitly and internal analytics
- WebTrends: 928 hits
- Bitly: 1505 hits (162%)
The difference can vary depending on the link. Some counts can be very close. But a few months back, I saw overcounts up to 4-5x greater for Bitly. So, what’s happening here? I contacted Bitly support about it and they explained the issue is that Bitly counts expansions of the URL and not necessarily clickthroughs. At first, this seemed like a design flaw in the service, but I realize it’s an unavoidable limitation of every URL shortener out there, and the problems are probably most apparent with Bitly because they are the most prominent one in use. Let me explain.
A URL shortener works in a particular way. A user sees a short URL like http://bit.ly/TWf3E. The web browser sends a request to the bitly web server for that page. The bitly server counts that as a hit and sends an HTTP redirect reply back to the web browser with the expanded URL which the browser then hops to. This is how it should work, and in an ideal world the subsequent request would register a hit on the real destination. But bitly has no way to know if the user followed the redirect. It could be the user’s machine crashed. It could be that the browser doesn’t follow redirects. It could be it’s a web crawler that gets counted by bitly but is ignored by the real site’s analytics. Or it could be a tool that does the request only to display the expanded URL to a user. All of these scenarios could be counted as a “hit” at the shortener without being a clickthrough to the remote site.
To their credit, bitly has recognized the problem and have been working on various mechanisms to reduce this overcounting, and I’ve seen a definite “drop” in bitly counts as more filters are added to remove bogus hits. This has included filtering out bots by recognizing signatures of their actions. And it also has included the addition of an expand method in the API for developers looking to expand URLs without requesting the web page and being counted (it also returns statistics on clicks). But no countermeasures will be perfect. Some bots may be hard to detect. And there are several reasons why developers might prefer to expand Bitly URLs by making a HTTP GET request like web browsers do instead of calling the API: it’s simpler to use, it doesn’t require an API key, and it’s all but guaranteed to be more reliable. If bitly is having catastrophic issues, which service do you think is going to get the most attention? The public website or the niche developer API? Apart from doing the right thing, there’s not much incentive for any developer to use a shortener’s API just to expand URLs. And so, it’s likely that some overcounting will always be with us for any URL shorteners.
This point matters, because bitly is one of the only truly open analytics out there. And people quotes its numbers like they come straight from God, often because they have no alternatives. Most companies (my employer included) will not share their statistics with the general public, although I have not yet chased down the reason why. And bitly hits are recorded at the origin of folllowing a link, rather than the destination, making them essential for understanding a service like twitter. But such ubiquity also deserves some skepticism. That certain argument about the worth of twitter followers suffers if actual traffic is half or a tenth what is reported. That outside audit of site traffic might include many small errors adding up to a big deviation from internal numbers. Bitly remains a highly useful service, but it’s important for all of us to remember how the hits are counted.