Quentin Hardy's recent post in the Bits blog of the New York Times touched on the gap between representation and reality that is a core element of practically every human enterprise. His post is titled "Why Big Data is Not Truth," and I recommend it for anyone who feels like joining the phony argument over whether "big data" represents reality better than traditional data.
In a nutshell, this "us" vs. "them" approach is like trying to poke a fight between oil painters and water colorists. Neither oil painting nor water colors are "truth;" both are forms of representation. And here's the important part: Representation is exactly that -- a representation or interpretation of someone's perceived reality. Pitting "big data" against traditional data is like asking you if Rembrandt is more "real" than Gainsborough. Both of them are artists and both painted representations of the world they perceived around them.
The problem with false arguments like the one posed by Hardy is that they obscure the value of data -- traditional data and big data -- and the impact of data on our culture. I'm now working my way through "Raw Data" is an Oxymoron, an anthology of short essays about data. I recommend it for anyone who is seriously interested in thinking about the many ways in which data has influenced (and continues influencing) our lives. I especially recommend "facts and FACTS: Abolitionists' Database Innovations" by Ellen Gruber Garvey. As its title suggests, the essay focuses on what proves to be an absolutely fascinating period of U.S. history in which the anti-slavery movement harvested data from real advertisements in Southern newspapers to paint a vivid and believable picture of the routine horrors inflicted by the slave system on real human beings.
That 19th century use of data mining built support for the anti-slavery movement, both in the U.S. and in England. The data played a key role in making the case for abolishing slavery -- even though it required the bloodiest war in U.S. history to make abolition a fact.
Data itself has no quality. It's what you do with it that counts.