> root / 2011 / May / 31st

Playing Photo Snap: Gauge and Button

Much excitement last week, well for me anyway.

I've been banging on for ages about setting up the new sparkly Zeitgeist which runs off the Omniture (an analytics package we use) API, rather than having it send emails over once an hour. The benefit of the API method is that's easier to control the throughput, backing off when things get busy and so on.

Well, various things across the Guardian site are controlled by the data coming out of the Zeitgeist, much of it use the Zeitgeist's own APIs and often hitting at a quite excessive (but necessary) rate. This is generally ok, apart from the times the big pile of emails arrive, at which point the requests and parsing the CVS logs that get triggers are all happening at the same time...

Which in turn cause some Task Queue misses, which get re-queued, sometimes creating a Task Queue Cascading Failure Storm - for want of a better name :) In short, a bit of a mess. I end up having to pause and purge the queues manually, which is not the point as COMPUTERS ARE SUPPOSED TO SERVE US, NOT THE OTHER WAY ROUND.


Last week we pointed the backend parts of the Guardian at the new Zeitgeist endpoints.

This is scary.

It's scary because all of a sudden the hundreds of requests that were pointing at the old system are now pointing at the new system.

It's scary because if it all fails then it'd have noticeable effects on parts of the Guardian's site.

It's scary because it's a live system and live data.

But, I laugh in the face of such a situation because continuous deployment is where it's at, pushing code live with the flip of a switch and a press of a button.

And I'd tested the fuck out of it.

And I'm awesome.

Needless to say it worked just fine. Actually it's more than fine, the new system is processing more reports from Omniture (there's a practical limit on the number of reports than can be emailed out) for far less CPU time...


Metric Old Zeitgeist New Zeitgeist
Number of Instances 8 4
Average QPS 0.264 0.216
Average Latency 961.8ms 376.7ms
Average Memory 80.9MBytes 28.4MBytes
CPU hours per day 26.5 7.5
Cost per day $1.99 $0.10

Aside from all the other values, with CPU hours per day being the main one, the new Zeitgeist now costs $3 per month to run (soon to change with AppEngine moving out of beta I'm sure) down from $60 per month for the old one.

There are still a few things running off the old Zeitgeist, some trending stuff. That's actually happens in the "Brain" but it's making calls to the Zeitgeist, once I've moved those off then the old one can be decommissioned and put to rest.

Which will be nice.

Number of meetings I had on Wednesday down in London: Lots
Hours of programming I got done last Wednesday down in London: 1.5
Was that actually done on the train on the way down? Yes
Bonus; have I also now learnt how to correctly pass JSON to RESTful endpoints in Python? Yes

## Table of Contents: May

### More

### About

This is what used to be a blog, but is now an online diary/archive sort of thing. As I'm laying off the twitter a bit and currently reclaiming all my Flickr photos this is the main place to find me, (subscribing via RSS) should still work.

I also have a podcast, with more audio experiments on SoundCloud. A smaller "scrapblog" is over on tumblr. If you need to get hold of me email hello@revdancatt.com

### Blogroll

Other blogs I track, some on the offchance they start publishing again.