Big Data…or Deep Data?
I have a confession to make. In recent months and years I have written many articles - including here on Newsline - and delivered conference papers that extolled the benefits of Big Data in audience measurement.
I was wrong.
I wasn't mistaken to talk about the game-changing benefits of Return Path Data from Set Top Boxes, the advantages for audience measurement and the opportunities for combining audience data with customer databases to aid media planning. However I was wrong to use those words: 'Big Data'.
What can I say? I didn't coin the phrase. All the big and cool kids were using it so I just joined the Greek chorus singing about big data, sexy data, data as the new black, the new oil. However just because the phrase is convenient doesn't mean it is right. I am here to propose a new, more appropriate name for such datasets: Deep Data.
I first proposed the term sitting on MediaTel's 'The Future of Media Research' panel in the UK last month, more as a sound-bite than anything else. However, last week I attended the excellent Media Playground event - this time in the audience - and was frustrated that a panel member rolled out the hoary old classic - and I paraphrase - that in this day and age how on earth could billions of pounds of advertising be traded on a 'panel of just 5,000 homes'.
I had no chance to challenge him as the session wound up at that point. Arguably it's not his fault. After all big things must be better than small things, right?
...and therein lies the problem with the phrase 'Big Data'.
The panellist may have been bemused that such a 'small' panel could determine ad strategy, but David Cameron seems to regularly perform policy U-turns based on UK opinion polls of less than 1,000 respondents. Those polls are carefully sampled, balance and weighted. No one would seriously contend that a political poll of a million Sky subscribers or Tesco clubcard holders would be more representative of the UK population as a whole. A bigger sample, yes, but not better.
Big data is not big data. It's deep data; data at a deeper level of individual granularity than we have ever had before. In theory not just channel changes but every click of the button, even volume levels.
However, that data will only tell you what users of that particularly system did; not the bigger picture - how many went also on to watch online, or via free-to-air? What were their demos?
In the context of TV audience measurement, I would argue strongly that the phrase 'Big Data' actually applies more to the industry currencies - BARB in the UK for example - than it does to server data sets. PeopleMeter panels will struggle to measure all forms of TV consumption without incorporating RPD and server data, but that bigger picture will still be needed, the context is essential.
So here's a call to action.
Firstly, a campaign is needed to remind data users of the basic simple principles of statistical sampling. In that context, the most important arbiter of usability is not how many zeros are on the end of the sample, but how balanced and representative it is.
This may sound patronising, but, believe me, there is a worrying lack of understanding here as I heard a few days ago. As more and more projects are moved in house and online there is a real danger that clients are going to make - or have already made - some pretty disastrous decisions based on 'big' but unbalanced or incorrect data.
Secondly, let's agree to move on from the phrase Big Data. In the context of audience measurement it's misleading, particularly to non-statisticians who assume that if big things are good then small things must be bad, however perfectly formed they are.
I am proposing that we move to talking about Deep Data. Perhaps someone out there has a better term. However let's stop this talk about 'Big Data' now. It won't be easy, but together we can do it:
"Hi, my name is Richard and it's been three weeks now since I used the term Big Data."
Richard Marks runs Research the Media. Find out more here.
I completely agree with Richard. Deep (Big) data is rich in detail for the specific source that generates it but, it needs the perfect form of the balanced and representative sample to be versatile and flexible for the required analysis.
It is vital, that in an age of massive data farmed by Facebook, Google et al, market researchers differentiate their data. The proposed term of "deep data" might be a very good start. The obsession with BIG shall also pass. But whilst it is there, we need to differentiate our industry from mindless data trawling.