> root / 2011 / June / 2nd

Henri Lefebvre

A short while ago alongside Meg Pickard I spoke at FutureEverything, we wanted to talk about the rolls of Editors, Robots, Strangers & Friends in the act of curation and overall online experience. However more about that some other time because I want to work through a small section of that talk, a bit I mentioned briefly about "Data Scientists".

I don't believe in them.

Data Science I don't have a problem with, it's the rise in "Data Scientist" jobs recently that I think are missing an important aspect, the dance, the rhythm. Example, read and view this, come back here once you've gotten over the urge to throw-up and stab things, I'll wait...

Data Science & The Role of the Data Scientist

If you haven't torn your own eyes out I'll try to explain more.

Firstly Programmers (or Software Engineers if you like)... in a nutshell programming consists of Loops, if statements, Lists, Sorts, Ordered Database Queries, maybe some binary searches and a bunch of common sense. A variety of algorithms, message queues and endpoints if you want to get fancy but anything more and you're generally overcomplicating the problem.

The one thing programmers didn't have until recently, mainly because of limitations in computing was the ability to deal with large, huge, massive datasets easily. Data the size of a city or a country, highly populated dynamic data.

Projects such as MapReduce and Hadoop are now allowing that to happen. You get to throw a whole bunch of data into a distributed system and make it bend to your will using cunning and guile.

A Data Scientist basically appears to be a Programmer + Big Data + Statistics.

However it's not something that generally needs to be done in our web world, organizations such as Facebook, Twitter, Nokia, FourSquare and so on who have the large data sets are the ones to gain the most from it. Anything smaller (which is most things) and you can carry on with your more traditional Loops, Sorts, Message Queues, DBs and indexes.

And because it's fairly new it's often these organizations that are inventing or contributing to the technologies used. See The Engineering Behind Twitter’s New Search Experience to get a good feel for that.

Which is why when another organization starts dealing with "Big Data" they advertise for a Data Scientist, but there's just not that many people with the experience. I know many programmers, I know two maybe three people who can handle big data.

So is it as simple as being a programmer and getting Hadooped up to fill this market niche?

Nope and here's why...

"in order to grasp and analyze rhythms, it is necessary to get outside them, but not completely: be it through illness or a technique. A certain exteriority enables the analytic intellect to function. However, to grasp a rhythm it is necessary to have been grasped by it; one must let oneself go, give oneself over, abandon oneself to its duration. Like in music and the learning of a language (in which one only really understand the meanings and connections when one comes to produce them, which is to say, to produce spoken rhythms). "In order to grasp this fleeting object, which is not exactly an object, it is therefore necessary to situate oneself simultaneously inside and outside."

  • Henri Lefebvre from Rhythmanalysis: Space, Time and Everyday Life.

Lefebvre in his 1992 collection of essays talks about the rhythm of cities. To me this is the flow of the people, the morning coffee routine, the lunchtime decisions, the evening meandering, the beat of the bar on a Friday night, the sweat dripping off the ceiling of a tiny club, the sun coming up late on a Saturday night. Strangers exchanging stolen kisses under umbrellas, the race across the road in a gap in the traffic, the sudden surprising green park round the corner, the hidden entrance to the underground stations.

How people shape the city, the pulse as agents gather together to form a temporary autonomous zone before collapsing back to being shaped by the city. To be not just in the city, but of the city.

I'm not a fan of cities, I lived in one, San Francisco and avoid another where I work, London. I can't design for cities as I don't understand them.

Lefebvre's writing suggests that to analyze a city you need to have been consum/mat/ed by it.

This to me is the same as Big Data. You can't just turn your Data Scientist eye onto something and say "Oh we'll throw this into MapReduce, it'll be awesome", you need to have been part of that data, to have lived it. We don't have Big Data where I work at the Guardian, we have lots-of-data, we look at Big Data out there and attempt to consume the signals. I came from Flickr which had fairly big fast data, the Guardian is positively quaint in comparison (in terms of what it generates). I set myself the task of getting immersed in the flow of news, trying to understand how the organization worked, the signals, the input, the output. The difference between news on a Monday to news on a Friday, the waves that Google and other sites can throw at you and so on. Living in the data, watching its rhythms, the pulse, the flow. I'm getting there, it takes a while, maybe I'm just old :)

To deal with big data you have to have been in it, not a Scientist but as a Dancer. I would at this point direct the dear reader to the music track Chime by Orbital (spotify) but I fear it may be pushing the analogy a step to far.

But that isn't enough, many people are already immersed in the data, here our journalists know all this stuff inside out. Getting carried away by the rhythms is as easy as getting in and letting yourself go. According to Lefebvre you then have to get back out again.

And that's the trick...

"in order to grasp and analyze rhythms, it is necessary to get outside them, but not completely: be it through illness or a technique. A certain exteriority enables the analytic intellect to function."

You can't have someone who's a "Data Scientist" just turn up and apply their tools, clusters and statistics. They haven't been in-it enough. And you can't have someone who's within the company, who understands and feels the flow of data everyday, unless, unless they know how to separate themselves, to get outside. When people grow with a company, love the company, understand everything that company could be, getting outside it is a hard won skill. The "Scientist" needs to be able to remove themselves and apply clear analytical skill, but with the fundamental understanding of the subject.

So all those companies advertising for a Data Scientist, I think I have this to say...

  1. You want a Dancer not a Scientist.
  2. Good luck with that!

As for the future of (data driven) journalism...

"In order to grasp this fleeting object, which is not exactly an object, it is therefore necessary to situate oneself simultaneously inside and outside."

The "fleeting object, which is not exactly an object" that's your story. The flow of data will gather together now and then forming a tangible shape for you to spot and grasp before it collapses back into the stream. You have to be in it to understand it, and outside to spot it. Just one or the other wont do.

And for not-journalists, it's your competitive advantage.

Further Reading

When the Data Scientist/Dancer has sufficiently honed their skill to identify useful shifts, patterns and rhythms in the data, they can then set up algorithms to spot these on their behalf.

To understand the usefulness of algorithms we should first fully understand Golems and Robots.

http://en.wikipedia.org/wiki/Golem http://en.wikipedia.org/wiki/R.U.R._(Rossum%27s_Universal_Robots))

## Table of Contents: June

### More

### About

This is what used to be a blog, but is now an online diary/archive sort of thing. As I'm laying off the twitter a bit and currently reclaiming all my Flickr photos this is the main place to find me, (subscribing via RSS) should still work.

I also have a podcast, with more audio experiments on SoundCloud. A smaller "scrapblog" is over on tumblr. If you need to get hold of me email hello@revdancatt.com

### Blogroll

Other blogs I track, some on the offchance they start publishing again.