Three things came up this Thanksgiving, all relevant to this blog.  Most importantly, my family stepped back from the brink of  gluttony and worked out a couple of manageable meals.  I felt merely full, gaining merely four pounds, rather than bursting at the seams and praying for sweet release.  I shall write a bit about coming back form the brink at a later date.  The second was the completion of my book draft.  Many parts of this book have only been seen by me and my dad.  Let us see what professional editors make of it.  The third, for some reason, was Big Data.


My wife, with a mind for business, scoffs at “Big Data” as yet another over-hyped buzzword.  Her background is in marketing, where complete datasets of consumer behavior are easy to come by.  A marketer can tell you exactly how much their consumer spent on their goods or services, and chart that performance against ad campaigns, product launches or various events.  Knowing about the population sod shopping behavior is not magic to marketers.

The promise of big data is the same as it is for marketers, knowledge of the population, not just a sample.  Samples are skewed. Not only in their design,but also in those who choose to join the sample.  Surveys are answered by those with the time or grievance to actually respond, leaving the blithely satisfied voiceless.  If we could collect the dull set of behaviors, we wouldn’t need statistics to infer the population, we would know the population.

The way we collect big data is through continuous feed of information, mostly through smart phones.    We can’t rely on people to fill out survey after survey, especially not long ones, to get the kind of atomic answers we need for Big Data.  We just want to collect data as people move about their day.  Such as where somebody goes, when they go, if they hit a pothole, how long they linger, and when.  The trick with big data is not in the statistics, but extracting meaning and  information from the firehose of data  produced this way.

This is in contrast to the way we used to collect knowledge.  Aside from surveys, we worshiped the options of experts with experience.  Experience is the accumulation of thousands of painful or ecstatic right/wrong decisions, forming an “instinct” as to what will and won’t work.  This repeated cycle of pain and pleasure is what suits us best for our environs.  Big data is not so hidebound, as it uses all that data at hand, not just the data within sight of one observer.  It has no environs to describe, it is the environs.

The weakness of big data is that it can only tell us what is, it cannot tell us what might be.  Past data and patterns are a poor predictor of the future, even if we have all the data at hand.  This is because billions of willful actors each act on their perceptions of right actions to maximize their lot.  Though big data measures the results of these struggles well, it cannot predict the curve-balls that we might throw in the effort to jump in line.


Next up, I think I’ll write a bit about copyright.

*I assure you I did not write the book on the train, as I did this piece.  Now that I’m sitting still, I can edit the blasted thing.