Archive for the ‘ Presentations ’ Category

Working With Wikileaks

I recently had the honor of speaking at a recent Hacks/Hackers meetup about working with the Wikileaks data. I talked about some of the work I had done with Sabrina Tavernise and Kevin Quealy looking at the sectarian violence in Baghdad during the occupation. Sabrina’s final story “Mix Of Trust And Despair Helped Turn Tide In Iraq” is our data-driven look back at the violence, and A Deadly Day in Baghdad is the graphic we ran alongside it, mapping the violence on both a single day and over the entire course of the occupation.

You can read my slides with their notes here:

My talk was organized around three principles of data journalism. These are just three rules that popped into my head while planning the talk out. I am sure there are more than three, and that there are more artful ways to express them, but these three rules articulated the things I think about when working with data. And they helped me explain why I hate word clouds with a violent passion too. So, what are these rules?

Find A Narrative. There is nothing that earns my scorn more than when a site does “data reporting” by just posting a data table without any context or a raw PDF. Word clouds are equally bad. If you want to report on data, you must first find a narrative in it. It doesn’t have to be the narrative — rich data sets have many narratives — but you need to find at least a narrative. This is essential for several reasons. It allows you to focus your inquiries around a single story and investigate specific questions. It also provides the means for the reader to understand and get more involved. Finally, it gives us ways of narrowing down large data sets to the necessary information to report the story without overwhelming with distractions. Human beings understand the world through narrative; find the narrative if you want to understand your data

Provide Context. The other reason why I usually am dismissive of calling raw data dumps journalism is their lack of context. Most data is complex, with its own nuances, jargon, and quirks. This was especially true of the Wikileaks War Logs, which were heavily laden with military jargon and inside information. It became very obvious from the beginning that we would want to to do more to explain these documents and what’s happening or we would overwhelm our readers. And so, when charting out the violence in Iraq, we provided some information about why there were concentrations of violence in certain neighborhoods. And when presenting the raw field reports, we worked to add an inline Jargon translator that would help decipher that a LN is a local national or a COP is a combat outpost (my personal favorite bit of inexplicable jargon: A “monica” in Iraq is a white Toyota Land Cruiser).

Work The Data. I always have to laugh at myself when some technology X is declared the savior of journalism. Or when someone writes a script to scrape twitter and declares it investigative journalism. So far, technology has done a far better job of collecting data, but analyzing it still remains very difficult. There is often no magic technology that will figure it out on your own; you’re going to have to work with it (I continue to feel everybody should read Brooks’ essay “No Silver Bullet” for its notions of accidental complexity and essential complexity). Furthermore, no data is perfect; when working with data, you must be aware of its inherent flaws and limitations. In the case of the Wikileaks data, there were many questions without answers. How much duplication was there in the database? How was the data on civilian homicides collected? Could we definitively say whether the Surge worked or was the decline in violence mainly because there was nobody left to kill and less will to endure the atrocities any longer? Having an idea of the methodology would’ve helped, but in this data, we didn’t even know who was collecting the homicide reports. Were there forces with the responsibility to assemble an accurate record? Or was it all just happenstance? When the killings waned, was it an actual decline in fatalities, or an illusion caused because units were too busy in the Surge to record the violence still happening? Could the leaker have forgotten to copy some records?

From a journalistic standpoint, this data was troubling. But luckily we had a few things to compare it against. Sabrina Tavernise reported from Baghdad while the sectarian violence raged, so she was able to name neighborhoods where we would expect to see the worst violence. To help out with this, I figured out how to extract MGRS coordinates from the reports and plot them on the map. This gave as a more accurate view of where the homicides were reported, although that too was not perfect (some coordinates did not geocode, and there were 40+ reports that geocoded to Antarctica). To get a feel for duplicates, we picked a single day, and Sabrina read through each homicide report, flagging roughly 20% of the reported deaths as duplicates (and finding cases where fatalities were not counted correctly as KIAs), so we could accurately show the violence of a single day.

On a parallel track, I really wanted to show how the violence rolled across Baghdad’s neighborhoods, so I hacked up a Ruby script to output all homicides within a bounding box for Baghdad to a KML file, with a colored circle and count that indicated the numbers of deaths. I added a little extra logic to split up roundup reports listing bodies found throughout Baghdad into individual points on the map. Finally, I found a map of Baghdad’s neighborhoods to use as a background. When the whole timeframe of the data was stepped through as an animation, you could see the violence surge in religiously mixed neighborhoods like Dora and Ghazaliya that Sabrina indicated from first-hand experience was where the worst violence was happening. To confirm we weren’t missing any data in the leak, we matched up the weekly homicide chart against the publicly released SIGACTS III numbers to verify the curves looked the same (the counts were larger in the wikileaks data because of duplicates). This data was ultimately used to visualize the year-by-year chart of the violence in the bottom of the graphic.

For the original version of the graphics, Kevin included numbers of civilian deaths for each year. Since each report included a summary table of people wounded and killed, it was only a trivial SQL statement to get body counts for each year in Baghdad. But the exactitude of those numbers implied a certainty that didn’t exist. We simply didn’t know how much duplication and omission there was in the raw reports, and so we decided to pull the numbers rather than use them (for a more detailed exploration of the new data’s limitations, see Iraq Body Count project). Just because it’s easy to derive a number from the database doesn’t always mean that number is correct. This may all seem academic, when we can just run a query and say “this is what the data contains.” But we don’t report on data itself. We report on reality, and we must assess how accurate a model of reality the data is.

So yes, it’s work to report on data, but it’s thrilling work at times. Hopefully, you’ll be inspired to work with data yourself.

Postscript: I forgot to include an interview with the CJR about the graphic last fall.

Another Twitter Talk

I’m not much of a blogger these days. Between my day job and some of its high-priority projects and my spare time at home with my family, including one rambunctious toddler, lengthy self-reflection is not often in the cards, especially since I blog there too!

But I do twitter. A lot. Which makes sense, because twitter offers me a sweet combination of low demands (even I can do 140 characters of text regularly), convenient access (via all sorts of third-party tools), and the superficial-yet-addictive connectedness of ambient intimacy. Recently, I was asked to deliver a few introductory talks about twitter at the New York Times—where I work and where I created the nytimes twitter feeds for my own benefit back in March 2007. I had done this before, but I lost those slides in a laptop malfunction, and I figured it would be fun to start fresh and experiment with a scrolling transition to suggest the fluid nature of twitter timelines. This is hard to see in a PDF, so I’m sharing them here in two formats.

Note that this is not the complete presentation. Although none of the information in here is confidential, I’ve been asked to remove a few slides near the end where I suggest some future directions for twitter usage at the New York Times. I have personally been thinking about twitter and the news a lot recently (more blog posts on that later) and thought it would be cool to suggest some ideas and examples. But I don’t want anyone to assume I am the Official Strategist for Twitter at the Times and any of my personal musings are corporate or editorial strategy. So, those slides had to go.


Explaining Twitter

As regular readers might know, I’m a big fan of Twitter. Perhaps it’s my lamentably shrinking free time or the general state of fatigue that makes it hard for me to coerce sentences together into a coherent post, but it’s a lot easier for me to prolifically twitter about all the profound and mundane (okay, pretty much just the mundane) moments in my life. More personal than the referential style of tumbleblogging but also too short to generally encourage overweening preciousness, Twitter hits that nice sweet spot of letting my friends know what I’m up to, but without it becoming a large chore to do it. Yeah, twitter is stupid, but it’s the right kind of stupid in the way in which it emphasizes that communication is the glue of community and the ease in which it allows everyone to take part. Microblogging is here to stay.

Last week, I gave a talk at the New York Times about Twitter. At the Times, we regularly have lunchtime talks on various educational topics and I thought it would be fun to do one on twitter and why I think it matters (_on a related note, it is interesting to see how various sections in the paper have covered the twitter phenomenon so far). As the developer behind the nytimes twitter feed, I also personally have an interest in seeing how twitter might mesh with more traditional forms of journalism and discuss what we could do further with the feeds, so it seemed like an excellent opportunity to talk about new technology at the Gray Lady. Here are the slides.

I was going for a more oblique visual style on the slides, so you might need to infer the context for a few of them, but the general thrust should be apparent. I’ve also had to redact a couple of slides where I made some suggestions about how the New York Times could expand and enhance its presence on twitter. There was nothing proprietary or wildly radical in them, but I wanted to just head off Gawker or other sites that might erroneously construe them as representing the Grand Official Vision for the New York Times on Twitter. There are some changes I would like to make to the feeds (visual branding of the icons is an obvious one), but my real goal here is to be part of a discussion at the paper and beyond about our place on Twitter and the modern Web at large. Let me know what you think. Thanks.

Rails Under the Knife

After months of rehearsing and revising, I finally gave the talk at OSCON. I think it could use another month of refinement, but people seemed to enjoy it, and I actually enjoyed giving it as well. If you were at the talk, thank you for coming and feel free to let me know if you have any feedback or questions.

My talk was Rails Under the Knife, a look at the some of the internals of Rails to get a better idea of 3 powerful Ruby techniques:

  • Metaprogramming
  • Reflection
  • Blocks

You can download the slides at

This talk is aimed at an intermediate Rails programmer who knows the basics of Rails coding (I have another similar talk for beginners called Rubyisms in Rails), but still is a bit unsure about the power trio of serious Rails hackery. Hopefully, this will help to provide some inspiration for you to delve into the Rails code on your own. Enjoy.

Packaging Your First Gem With Hoe

My name is Jacob Harris, and I am a rubygem addict. I’d estimate I have hundreds of them tooling around on my hard drive, useful little snippets of Ruby or C library wrappers or random noodlings. I might not actually use most of them in any of my projects, but like a vast library of unread books, I enjoy having them around. But for the longest time, I’ve been a freeloader. I’ve downloaded gems, but I’ve never written one, but it’s time I start giving something back.

And so, I’ve written my first gem. It’s nothing incredible – all things have to start simple – but I like it. It’s called Amazon Hacks and it consists of two classes (for now) to benefit people whose sites handle Amazon links. The Amazon::Hacks::Link class contains a few methods to extract an ASIN from any product link, normalize product links, and append an affiliate ID easily. The slightly sillier Amazon::Hacks::Image class puts a convenient Ruby wrapper around the convoluted syntax Amazon uses for its image transformation engine. If you work on a site that links to Amazon product pages (e.g., All Consuming), try it out and let me know if it works for you or it can be improved. It’s simple to get started, simply run

gem install amazon-hacks—include-dependencies

Which brings me to hoe. Last night, I gave a talk on hoe to the NYC.rb group and the slides are here if you want to learn more about the process.

Before last week, I had no idea of what it takes to create a gem, but it seemed like a lot, and I had better things to distract myself with. And it does indeed require a fair amount of busy work, what I call administrivia, to turn your snippet of Ruby code into a packaged gem, and this work has to be started anew for each gem you want to create. Now, as pragmatic programmers, we learn to automate menial tasks whenever possible, and hoe makes the creation of gems a lot more manageable by automating the busy-work away via a set of useful rake tests. The result is more time for coding, faster releases, and more likely you’ll release that gem in the first place. So, give hoe a shot, learn about gems, and start writing gems. You’re a brilliant Ruby coder, it’s time to share it with the world. And when you write that gem, I’ve got a cherished spot on my hard drive for it.


Get every new post delivered to your Inbox.