The Long And Short Of It

Recently, Jay Rosen pointed that Bitly analytics seemed to report many more hits than other URL shorteners. The problem looked eerily familiar; and then I realized it was the same issue I had observed a few months back and had even half-written a blog post about it that never got posted (if you read my blog, this should come as no surprise). In the interest of explaining things thoroughly, here finally is a post on the matter.

Bitly has become the preeminent URL shortening service, largely due to the fact that it provides easy access to analytics on usage (you merely need to append a plus-sign to the shortened URL). This is the reason I converted all of the automated New York Times twitter accounts over to bitly 3-4 months ago, in the hopes of sharing the analytics of twitter user behavior with the world. However, when I collected a month worth of link usage statistics from Bitly, I noticed that the quoted total seemed unusually high, a 3-5x multiplication of hits recorded by our page analytics software. At first I thought it might be a result of comparing apples to oranges: the bitly counts might include other users shortening the same URL, and internal analytics is undercounting twitter traffic since very few twitter users come to the site via twitter.com.

Fortunately, though I have a way to check apples directly against apples. When my automated script posts a URL to twitter, it appends a query string to the URL before shortening to indicate that the click is coming from a twitter client (and also which account). So, if you look at the URL for this movie review of District 9 posted to the nytimes twitter account last week you’ll notice it has some additional arguments on the end:

http://movies.nytimes.com/2009/08/14/movies/14district.html?src=twt&twt=nytimes

you can see it has ?src=twt&twt=nytimes appended to the end of it. This allows me to tag all hits from my automated accounts and compare directly the numbers from bitly (because nobody else is shortening that specific URL) to the numbers reported from internal analytics (because those hits are coming solely from the automated accounts). The really savvy among might notice I can also distinguish traffic from different accounts on twitter. But this is where the discrepancy became apparent. Here are the counts for that URL in bitly and internal analytics

  • WebTrends: 928 hits
  • Bitly: 1505 hits (162%)

The difference can vary depending on the link. Some counts can be very close. But a few months back, I saw overcounts up to 4-5x greater for Bitly. So, what’s happening here? I contacted Bitly support about it and they explained the issue is that Bitly counts expansions of the URL and not necessarily clickthroughs. At first, this seemed like a design flaw in the service, but I realize it’s an unavoidable limitation of every URL shortener out there, and the problems are probably most apparent with Bitly because they are the most prominent one in use. Let me explain.

A URL shortener works in a particular way. A user sees a short URL like http://bit.ly/TWf3E. The web browser sends a request to the bitly web server for that page. The bitly server counts that as a hit and sends an HTTP redirect reply back to the web browser with the expanded URL which the browser then hops to. This is how it should work, and in an ideal world the subsequent request would register a hit on the real destination. But bitly has no way to know if the user followed the redirect. It could be the user’s machine crashed. It could be that the browser doesn’t follow redirects. It could be it’s a web crawler that gets counted by bitly but is ignored by the real site’s analytics. Or it could be a tool that does the request only to display the expanded URL to a user. All of these scenarios could be counted as a “hit” at the shortener without being a clickthrough to the remote site.

To their credit, bitly has recognized the problem and have been working on various mechanisms to reduce this overcounting, and I’ve seen a definite “drop” in bitly counts as more filters are added to remove bogus hits. This has included filtering out bots by recognizing signatures of their actions. And it also has included the addition of an expand method in the API for developers looking to expand URLs without requesting the web page and being counted (it also returns statistics on clicks). But no countermeasures will be perfect. Some bots may be hard to detect. And there are several reasons why developers might prefer to expand Bitly URLs by making a HTTP GET request like web browsers do instead of calling the API: it’s simpler to use, it doesn’t require an API key, and it’s all but guaranteed to be more reliable. If bitly is having catastrophic issues, which service do you think is going to get the most attention? The public website or the niche developer API? Apart from doing the right thing, there’s not much incentive for any developer to use a shortener’s API just to expand URLs. And so, it’s likely that some overcounting will always be with us for any URL shorteners.

So, who cares? The ugly truth about web analytics is nearly all of it involves some amount of error. The phrase should be Lies, damned lies, and web analytics. Analysis of web server logs can overcount automated traffic from web crawlers. Modern analytics programs that embed javascript on pages to thwart bots can undercount clients that don’t load the entire page or don’t execute javascript (a particular concern with mobile browsers). Panel-driven approaches like Nielsen can severely undercount because they extrapolate from focus group that might not closely represent the actual Internet audience. Some techniques might be more accurate than others, but the central problem of web analytics is that you never know for certain how many of your visitors are actually humans. And it gets even fuzzier when you attempt to discern unique visitors from the flow of hits that are your site’s traffic; any estimation of visitors is built on assumptions about web usage and thus adds its own levels of distortion and error. I would hope many web analytics experts understand this and approach stats with appropriate grains of salt. For instance, if I see a surge in bitly clickthroughs for a New York Times article on twitter, is that incontrovertible evidence that twitter-themed stories are eagerly consumed by the twitter users? Or is it more a case of automated bots grabbing and expanding any link tagged twitter? Or is it both? You’ll never know with absolute certainty.

This point matters, because bitly is one of the only truly open analytics out there. And people quotes its numbers like they come straight from God, often because they have no alternatives. Most companies (my employer included) will not share their statistics with the general public, although I have not yet chased down the reason why. And bitly hits are recorded at the origin of folllowing a link, rather than the destination, making them essential for understanding a service like twitter. But such ubiquity also deserves some skepticism. That certain argument about the worth of twitter followers suffers if actual traffic is half or a tenth what is reported. That outside audit of site traffic might include many small errors adding up to a big deviation from internal numbers. Bitly remains a highly useful service, but it’s important for all of us to remember how the hits are counted.

About these ads
    • Noah Gray
    • August 18th, 2009

    Thanks for the information. RE: why companies won’t reveal click/download statistcs, many just don’t want to do this openly without knowing where they stand in relation to their competitors. Unilateral dissemination of this information may lead to an approving slap on the back from the few interested public consumers, but leaves the generous provider vulnerable to attack. Few will indulge in the nuances of web analytic accuracy, so the raw numbers are likely to be interpretted and digested as is. One could imagine how a non-sharing competitor could lure advertising dollars away by cladestinely providing better traffic numbers to clients; numbers hard for the generous company to dispute if they are not public.

  1. I tend to agree that analytical interpretation is certainly a wild card;especially, when there’s marketing dollars to be gotten. I am not understanding who gets the link credit from an SEO perspective when shrinking URL’s. My limited knowledge of GETs & redirects leads me to the marketing gals conclusion that no SEO (linkback)is given to the original url shrinker-uppers web site. Is this correct?
    Technically courious

    • kira
    • August 28th, 2009

    Thanks so much for the explanation…we’ve been trying to figure this out for months!

  2. In our recent study TWITTER: The Dark Side – Does Bitly Enables Massive Click Fraud we have proven that statistics Bit.ly provides is egregiously inaccurate at best and fraudulent at worst. Bit.ly counts cyberspace’s ghosts and drones, bots and crawlers, presenting them all as humans.

    They do not discriminate between real and robotically generated automatic clicks with no actual human behind the click.

    We even have found one single bitly link wiht more than million clicks (1,677,769 to be precise) without a single human behind it. This in-depth analysis could be found here:
    http://www.seo-artworks.com/Twitter/twitter-study-millionclicks.htm

  3. I’ve recently become more and more skeptical of web analytics systems like Google Analytics and Web Trends because it’s tough to take a pass-through click tracker (like bit.ly), an image or script based tracker and log file analysis to even vaguely resemble one another. It seems like each method leaves something big out. The click trackers miss clicks that go around them (if they use a redirect), the image/scrip trackers often get blocked by anti-malware software, and logfiles miss content hits cached upstream.

    @roman – I’m not surprised that that is bots are being counted.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: