Nimble Code

The Long And Short Of It

Recently, Jay Rosen pointed that Bitly analytics seemed to report many more hits than other URL shorteners. The problem looked eerily familiar; and then I realized it was the same issue I had observed a few months back and had even half-written a blog post about it that never got posted (if you read my blog, this should come as no surprise). In the interest of explaining things thoroughly, here finally is a post on the matter.

Bitly has become the preeminent URL shortening service, largely due to the fact that it provides easy access to analytics on usage (you merely need to append a plus-sign to the shortened URL). This is the reason I converted all of the automated New York Times twitter accounts over to bitly 3-4 months ago, in the hopes of sharing the analytics of twitter user behavior with the world. However, when I collected a month worth of link usage statistics from Bitly, I noticed that the quoted total seemed unusually high, a 3-5x multiplication of hits recorded by our page analytics software. At first I thought it might be a result of comparing apples to oranges: the bitly counts might include other users shortening the same URL, and internal analytics is undercounting twitter traffic since very few twitter users come to the site via twitter.com.

Fortunately, though I have a way to check apples directly against apples. When my automated script posts a URL to twitter, it appends a query string to the URL before shortening to indicate that the click is coming from a twitter client (and also which account). So, if you look at the URL for this movie review of District 9 posted to the nytimes twitter account last week you’ll notice it has some additional arguments on the end:

http://movies.nytimes.com/2009/08/14/movies/14district.html?src=twt&twt=nytimes

you can see it has ?src=twt&twt=nytimes appended to the end of it. This allows me to tag all hits from my automated accounts and compare directly the numbers from bitly (because nobody else is shortening that specific URL) to the numbers reported from internal analytics (because those hits are coming solely from the automated accounts). The really savvy among might notice I can also distinguish traffic from different accounts on twitter. But this is where the discrepancy became apparent. Here are the counts for that URL in bitly and internal analytics

  • WebTrends: 928 hits
  • Bitly: 1505 hits (162%)

The difference can vary depending on the link. Some counts can be very close. But a few months back, I saw overcounts up to 4-5x greater for Bitly. So, what’s happening here? I contacted Bitly support about it and they explained the issue is that Bitly counts expansions of the URL and not necessarily clickthroughs. At first, this seemed like a design flaw in the service, but I realize it’s an unavoidable limitation of every URL shortener out there, and the problems are probably most apparent with Bitly because they are the most prominent one in use. Let me explain.

A URL shortener works in a particular way. A user sees a short URL like http://bit.ly/TWf3E. The web browser sends a request to the bitly web server for that page. The bitly server counts that as a hit and sends an HTTP redirect reply back to the web browser with the expanded URL which the browser then hops to. This is how it should work, and in an ideal world the subsequent request would register a hit on the real destination. But bitly has no way to know if the user followed the redirect. It could be the user’s machine crashed. It could be that the browser doesn’t follow redirects. It could be it’s a web crawler that gets counted by bitly but is ignored by the real site’s analytics. Or it could be a tool that does the request only to display the expanded URL to a user. All of these scenarios could be counted as a “hit” at the shortener without being a clickthrough to the remote site.

To their credit, bitly has recognized the problem and have been working on various mechanisms to reduce this overcounting, and I’ve seen a definite “drop” in bitly counts as more filters are added to remove bogus hits. This has included filtering out bots by recognizing signatures of their actions. And it also has included the addition of an expand method in the API for developers looking to expand URLs without requesting the web page and being counted (it also returns statistics on clicks). But no countermeasures will be perfect. Some bots may be hard to detect. And there are several reasons why developers might prefer to expand Bitly URLs by making a HTTP GET request like web browsers do instead of calling the API: it’s simpler to use, it doesn’t require an API key, and it’s all but guaranteed to be more reliable. If bitly is having catastrophic issues, which service do you think is going to get the most attention? The public website or the niche developer API? Apart from doing the right thing, there’s not much incentive for any developer to use a shortener’s API just to expand URLs. And so, it’s likely that some overcounting will always be with us for any URL shorteners.

So, who cares? The ugly truth about web analytics is nearly all of it involves some amount of error. The phrase should be Lies, damned lies, and web analytics. Analysis of web server logs can overcount automated traffic from web crawlers. Modern analytics programs that embed javascript on pages to thwart bots can undercount clients that don’t load the entire page or don’t execute javascript (a particular concern with mobile browsers). Panel-driven approaches like Nielsen can severely undercount because they extrapolate from focus group that might not closely represent the actual Internet audience. Some techniques might be more accurate than others, but the central problem of web analytics is that you never know for certain how many of your visitors are actually humans. And it gets even fuzzier when you attempt to discern unique visitors from the flow of hits that are your site’s traffic; any estimation of visitors is built on assumptions about web usage and thus adds its own levels of distortion and error. I would hope many web analytics experts understand this and approach stats with appropriate grains of salt. For instance, if I see a surge in bitly clickthroughs for a New York Times article on twitter, is that incontrovertible evidence that twitter-themed stories are eagerly consumed by the twitter users? Or is it more a case of automated bots grabbing and expanding any link tagged twitter? Or is it both? You’ll never know with absolute certainty.

This point matters, because bitly is one of the only truly open analytics out there. And people quotes its numbers like they come straight from God, often because they have no alternatives. Most companies (my employer included) will not share their statistics with the general public, although I have not yet chased down the reason why. And bitly hits are recorded at the origin of folllowing a link, rather than the destination, making them essential for understanding a service like twitter. But such ubiquity also deserves some skepticism. That certain argument about the worth of twitter followers suffers if actual traffic is half or a tenth what is reported. That outside audit of site traffic might include many small errors adding up to a big deviation from internal numbers. Bitly remains a highly useful service, but it’s important for all of us to remember how the hits are counted.

Filed under: Web , ,

So Long Sections?

Over the past few years, there has been no shortage of real and virtual ink devoted to what the future of newspapers might be (although I would prefer if there were less writing and more doing). I’m not interested in rehashing the teeth-gnashing and grave-dancing in so many of these pieces but instead I’m interested in the way the structure of newspapers will change when they move to a web-only environment. This is the first of a few posts to explore that.

So what might change? For starters, consider the newspaper section. If you look at a newspaper from 100 years ago, it’s shocking to see how much content from various beats is jumbled onto pages next to each other. This might be tolerable for a newspaper of only 18 pages, but as newspapers grew in size and scope — In 2000, the New York Times published its largest weekday newspaper ever of 174 pages — it became necessary to organize stories into sections. Sections make sense for a printed product. They enable a physical organization that allows readers to pick the content they like without an index by pulling out what they like and ignoring the rest. This becomes a selling point in itself for the hypersectionalized Sunday newspaer as highlighted in this New York Times ad for the Weekender (and ably parodied by the 92nd Street Y Tribeca here). Sections are also reflected in the organizational structure of a modern newsroom, where each desk is an autonomous crew of reporters, editors, designers, etc. all working to create their part of the newspaper.

On the Web however, sections don’t really make sense. We don’t need the physical grouping and arrangement of stories, because we don’t have anything physical to group. We can find the content we want by searching for it, rather than pulling out a section of pages. And yet, if you visit any newspaper website, you’ll see the print sections staring right at you with their stories bullet-pointed below. And even if you go to online news ventures, the sections are still there. For instance, here is the organization of the Huffington Post: Home, Politics, Media, Business, Entertainment, Living, Style, Green, World, and Chicago. You can imagine how a print version of the Huffington Post would be organized by that alone. Google News has similar groupings. The Daily Beast is a little cheekier (Cheat Sheets, Big Fat Stories), but once you click into one of those you see the same sections (arts, science, etc.). This is not to say that classifying content by its sectional subject is useless, but it’s funny how a physical scheme continues to predominate online where the need for such an approach is no longer necessary. And how such an approach is stifling serious investigations into other ways of organizing the news.

In many ways, this is akin to your computer’s user interface. Unless you’re one of those people who does everything in Emacs, you probably interact with your computer via documents, folders, and other concepts designed to make the computer easier to learn by a novice. This Desktop Metaphor is why we interact with our 2009 computers by pushing around little icons of things that a 1950s secretary would recognize. What’s wrong this approach? Well, what isn’t weird about pretending all sorts of data are little documents we can shuffle around virtual file cabinets? And as we add more interlinked and indexed content to our lives, the folder metaphor gets more stretched (what would the physical analog of an alias be? A search folder?). So, why does the desktop metaphor stick around? Possibly because most alternatives often resort to a eyecandy gimmickry more at-home in a science fiction movie and the perceived learning curve is too great to switch. But mostly because we’ve all been trained to think in folders and documents (just like Google’s greatest triumph has been to make us search in Google’s style). The Desktop Metaphor lives on because through being around so long it’s now become natural to computer users.

So, in that sense, my headline is misleading. Sections will probably persist online, but not without their own drawbacks. Take the issue of context. Articles in the physical newspaper gain an implicit context by their location in the newspaper (is it on the frontpage? is it in the metro section?) that is easily lost online, where that context is not so clear, especially when read in aggregators or search results. In this sense, Google is to newspaper sections what the iTunes store has been to the album. For instance, why the New York Times has published so many articles about Twitter in April? In the printed product where things run in the Dining, World, Technology, etc. it’s clear that each story has been produced by different sections working independently (the bottom-up view). On the website where every twitter story falls under Technology, it seems like an overtly directed plan to incessantly promote twitter (the top-down view).

More importantly, I feel like the sectional viewpoint creates artificial silos that compartmentalize the news. Take the really big news stories of our time — the economic meltdown, swine flu, the wars in Iraq and Afghanistan, etc. — and it’s very easy to see how parts of these stories touch multiple sections. But the sectional organization obscures the holistic impact of these events by inviting readers to discard parts of the story because they only like reading about national news and not business, etc. Not to mention the difficulties presented to a reader who wants to get up-to-speed quickly on a topic. Picture instead of sections, a topic-driven organization that would provide background and content tied to the big picture rather than the location where the print article would run. Would a newspaper without sections still be a newspaper? Perhaps not, but it’s time we killed sections, so the newspaper might live.

Filed under: Journalism ,

Another Twitter Talk

I’m not much of a blogger these days. Between my day job and some of its high-priority projects and my spare time at home with my family, including one rambunctious toddler, lengthy self-reflection is not often in the cards, especially since I blog there too!

But I do twitter. A lot. Which makes sense, because twitter offers me a sweet combination of low demands (even I can do 140 characters of text regularly), convenient access (via all sorts of third-party tools), and the superficial-yet-addictive connectedness of ambient intimacy. Recently, I was asked to deliver a few introductory talks about twitter at the New York Times—where I work and where I created the nytimes twitter feeds for my own benefit back in March 2007. I had done this before, but I lost those slides in a laptop malfunction, and I figured it would be fun to start fresh and experiment with a scrolling transition to suggest the fluid nature of twitter timelines. This is hard to see in a PDF, so I’m sharing them here in two formats.

Note that this is not the complete presentation. Although none of the information in here is confidential, I’ve been asked to remove a few slides near the end where I suggest some future directions for twitter usage at the New York Times. I have personally been thinking about twitter and the news a lot recently (more blog posts on that later) and thought it would be cool to suggest some ideas and examples. But I don’t want anyone to assume I am the Official Strategist for Twitter at the Times and any of my personal musings are corporate or editorial strategy. So, those slides had to go.

Enjoy.

Filed under: Presentations ,

Explaining Twitter

As regular readers might know, I’m a big fan of Twitter. Perhaps it’s my lamentably shrinking free time or the general state of fatigue that makes it hard for me to coerce sentences together into a coherent post, but it’s a lot easier for me to prolifically twitter about all the profound and mundane (okay, pretty much just the mundane) moments in my life. More personal than the referential style of tumbleblogging but also too short to generally encourage overweening preciousness, Twitter hits that nice sweet spot of letting my friends know what I’m up to, but without it becoming a large chore to do it. Yeah, twitter is stupid, but it’s the right kind of stupid in the way in which it emphasizes that communication is the glue of community and the ease in which it allows everyone to take part. Microblogging is here to stay.

Last week, I gave a talk at the New York Times about Twitter. At the Times, we regularly have lunchtime talks on various educational topics and I thought it would be fun to do one on twitter and why I think it matters (_on a related note, it is interesting to see how various sections in the paper have covered the twitter phenomenon so far). As the developer behind the nytimes twitter feed, I also personally have an interest in seeing how twitter might mesh with more traditional forms of journalism and discuss what we could do further with the feeds, so it seemed like an excellent opportunity to talk about new technology at the Gray Lady. Here are the slides.

I was going for a more oblique visual style on the slides, so you might need to infer the context for a few of them, but the general thrust should be apparent. I’ve also had to redact a couple of slides where I made some suggestions about how the New York Times could expand and enhance its presence on twitter. There was nothing proprietary or wildly radical in them, but I wanted to just head off Gawker or other sites that might erroneously construe them as representing the Grand Official Vision for the New York Times on Twitter. There are some changes I would like to make to the feeds (visual branding of the icons is an obvious one), but my real goal here is to be part of a discussion at the paper and beyond about our place on Twitter and the modern Web at large. Let me know what you think. Thanks.

Filed under: Presentations ,

On Returning To Blogging Here After A Long Time Away

In which the author continues to use a title formulation—one that seems right out of the 19th century, but these days denotes a certain overweening preciousness well suited to be published by McSweeney’s—to explain his long absence from-,

Ah, screw it. The question on the minds of my remaining readers (all 10 of you) might be where the heck have I been (sorry for the sad invective, but I’ve been trying to cut down on my cursing for reasons that should soon be clear) and why am I blogging again now? 250-some days is a long day to be quiet, and it’s not like the blog was that awesome before it went on hiatus. What happened?

Good question. To be honest, the main reason is I’ve been rather busy. For starters, I have still been blogging all this while, but for the New York Times’ open source initiatives at our blog Open. The main reason though is that I am a proud father of the most amazing kid in thr world. It’s not that the baby keeps me from blogging, rather it’s just that blogging doesn’t really compare at all to spending time with him (my personal coding productivity has similarly been very low). Especially since, to be bluntly honest, the writing on this blog had become as boring as listening to a Garrison Keillor marathon. Better not to do it.

So, why restart now? Because it just feels fun again. And because I actually feel like it might also be interesting as well to continue my musings on the future of newspapers (and my experiences and experiments along those lines) in a forum that is not as official and fraught with consequences for misplayed snark like the official New York Times blog would be.

This is not to say I will be dishing dirt and spilling secrets. I like my job enough to not want to lose it, and that’s not really my style. But I think it would be fun (at least to me) to post my occasional rants with perspective from inside the New York Times, and perhaps, if I’m lucky, fun for some random people on the Interwebs to read it. Sound like a plan?

If worse comes to worse, I’ll just stop again. It wouldn’t be the first time…

Filed under: Meta ,

On Appearing In The Background Behind Two Pulitzer Winners

Adventures in Background Lurking

Wow, this is just so unexpected… Where to begin…

Let me start by stating how amazed I am to be here. As a semipro quasi-journalist wannabe, I’ve been in awe of the Pulitzers for a long time. And while I have daydreamed many a tired morning of winning one, I never seriously believed I would find myself in this spot today: unwittingly standing in the background in a photograph of two Pulitzer winners.

I wish I could say it was skill: my uncanny knack for being sublimely oblivious to photographers focusing on their actual subjects a few feet in front on me. But I must admit that luck has played a far larger part in my current fortune than most other men might want to admit. If not for the chance proximity of me to these fine Pulitzer winners, my labor would be relegated to obscurity like so many other such pictures, scattered across the negatives and memory cards of so many tourists’ vacations.

Of course, fate may have delivered me to this moment, but once there my years of training guided to success. I refrained from blinking, I didn’t pick my nose, I subconsciously stood at the right place to hide the coffee stain on my jeans, all of which made a difference in the selection of this photo over so many others. Luck may deliver you to these opportunities, but once there, it’s up to your talents to make the most of it.

But enough about me; sorry I’m rambling so much, it’s just such a crazy moment! To riff on Hillary, it certainly takes a village to take a Pulitzer crowd shot, and I have so many people to thank for making this day possible. Obviously, a lot of praise goes to Walt Bogdanich and Amy Harmon not just for truly excellent reporting that illustrates the power of journalism but also for standing in front of me at the decisive moment.

To the amazing NY Times photography desk, for their peerless skill at capturing the moment when the winners are smiling and I don’t have a dorky look on my face. They make it look easy, but it’s not! Photography was the key difference in bringing my story to light… Of course, thanks also go to Graphics and Computer Assisted Reporting, who led the way up the stairs but at a key moment went left while I went right. And to the Web Producers who ran this photo on the website, thus ensuring I had my 15 minutes of Internet fame to blog about. And that Website which also gave me a job so that I could one day stand here. Right behind the Pulitzer winners.

Of course, thanks also go out to Renzo Piano for designing this new building with its skylight that allows me to be bathed in flattering natural light as opposed to the harsh judgment of flourescent.

And where would I be without Bill Keller who drove this story every step of the way: from calling the all-hands meeting to naming the awards to pointing out the Pulitzer winners just a few feet from where I haplessly stood.

Finally, none of this would be possible without the fine work of our publisher Arthur Sulzberger, Jr.. Not only does he continue the best damn newspaper in the whole world. Not only did he give us this fine new newsroom. But he continues the tradition of championing excellence and integrity in journalism that all of us stand behind.

For some of us, more literally than others. Thank you.

Filed under: Humor , ,

Rails Under the Knife

After months of rehearsing and revising, I finally gave the talk at OSCON. I think it could use another month of refinement, but people seemed to enjoy it, and I actually enjoyed giving it as well. If you were at the talk, thank you for coming and feel free to let me know if you have any feedback or questions.

My talk was Rails Under the Knife, a look at the some of the internals of Rails to get a better idea of 3 powerful Ruby techniques:

  • Metaprogramming
  • Reflection
  • Blocks

You can download the slides at

This talk is aimed at an intermediate Rails programmer who knows the basics of Rails coding (I have another similar talk for beginners called Rubyisms in Rails), but still is a bit unsure about the power trio of serious Rails hackery. Hopefully, this will help to provide some inspiration for you to delve into the Rails code on your own. Enjoy.

Filed under: Presentations , ,

See You at OSCON, Maybe…

I know I haven’t been posting to this blog much lately (see this link for one reason why). But I just wanted to note that I will be giving a talk at OSCON 2007 in Portland this week, and I would love to talk to you if you are there.

In addition, my coworker will be giving a talk on the DBSlayer (one of the projects we’re open sourcing) that should be fun to see.

Finally, be sure to come to one of our Birds of a Feather (BoF) sessions:

Now I just have to make it there, which is the subject of another angry post about disconnecting effects of technology, or how United Airlines can oversell flights and then randomly bump a father and mother with 2-month old infant leaving them to scramble for a flight next day. To be followed by 45 minutes waiting in a customer service line only to be told I’m supposed to go to a customer service station in the next terminal. Joy!

The shorter version of this story: United Airlines is a miserable and indecent excuse for a business and when they go bankrupt again, I shan’t shed a tear…

Filed under: Meta ,

Unsafe At Any Speed

Last night, I almost saw my dog killed in front of my very eyes.

We were crossing with the light on 7th Avenue in Brooklyn, having just stopped by Cafe Steinhof to say hi to some of the people outside. When an SUV accelerated into the left turn, I saw a front bumper miss me by an inch and pass over my dog’s head. I then had to pull her out of the front wheel well before he crushed her and bring her back to the curb. The driver was oblivious to my screaming and kept on down the road. Nobody got the plates, and I was on my way running down a few blocks to the pet hospital with Bella in my arms (when you’re on adrenalin, a 55-pound dog seems much lighter).

We should probably rename her “Lucky.” After $500 of tests and screening, she may have emerged without much in the way of major injury (I’m hoping a tear in her Anterior Crucial Ligament isn’t in the cards though), but she hasn’t been too happy here at home, even with the pain killers. So, I guess I should be feeling fortunate…

But I just feel angry instead. This was not an accident that had to happen, and I think the fact it was a massive SUV was the cause of the problem. This is not to say that there are no bad drivers for smaller cars, but it’s a lot harder to run someone over and not notice in a MINI. The driver did not race away in a panic, we were just a bump in the road, a minor skip for the CD system, nothing to notice. And that was the truly scary part to me. How can car makers talk about the “safety” of your vehicle, when they’re really engineering a decrease in safety for everyone else? And what does this do to our cities, our public places, when we create these speed lanes for the oblivious and disconnected to barrel through without any caution? There is no single point to blame here, but I feel like we’re engineering a society disconnected from the effects of its actions, insulated from the outside world, and craving more of the same. And that just fills me with sadness this morning.

Filed under: Personal , , ,

Blogging? Me?

Sometimes, it’s best to just come out and say it, so here we go: I suck at blogging these days. This is not me beating myself – I don’t feel bad in the slightest – I just am acknowledging the truth of the matter. Sorry for any of my readers who haven’t consigned me to the dustbins of their feed readers yet, but it’s unlikely I will be producing any riveting content anytime in the near future. I have a lot of changes in my life coming up – new coop! baby! projects here at the Times! – that I’m spent when I get home to blog (of course, thanks to Time-Warner Cable I haven’t had Internet at home anyway) and don’t produce any writing that meets my high standards for Original Blogging Content (TM).

However, I do still have time to feed content into a few other places, for those that need to get their Jake fix. For starters, you can follow the minutiae of my random thoughts (all less then 140 characters) at Twitter. In addition, I have started what’s known as a tumbleblog over on Tumblr, which is where I will post content that’s the opposite of Nimble Code: pithy, snarky, non-technical, and varied. Feel free to check both out and one day I will get back to writing here as well.

Finally, if you were interested in some of the topics from my Future of Newspapers posts but want to see a professional journalist’s perspective, I strongly suggest Frontline’s News War series, being broadcast now on PBS and also viewable on the Web.

Filed under: Meta

del.ico.us

Twitter

About Me

My name is Jacob Harris, and I am a Senior Software Architect at the New York Times. This is my personal blog, and so these views are my own and do not represent any official positions of my employer. It may seem stupid to say this, but people have a habit of making the most stupid assumptions about what the Times is doing online, I have to say it. For most of you, sorry for doubting your intelligence there.