Gregory Kanevsky

The Role of Small Data and Vacation Recap Example

Blog Post created by Gregory Kanevsky on Jul 8, 2017

In the grand scheme of big data things small data is the last mile of data science analysis. It still requires interpretation (or representation) in the form of visualization or application.

Indeed, Wikipedia defines small data  'small' enough for human comprehension but then it goes further by qualifying data in a volume and format that makes it accessible, informative and actionable. I am not certain the latter is always true: smaller footprint doesn't automatically qualify data as informative and actionable without more work. In my book small data usually scales to kilobytes and has just a handful of dimensions. But its main feature remains human comprehension which really means there is simple story behind it. 


Case in point could be Google spreadsheet I created this summer while on vacation in Italy. Initially it contained daily miles and steps walked and later I added main attractions for each day. The result was my personal small data covering about 2 weeks of touring Italy with bases in Rome and later in Sicily (this sentence was the story):



As-is this spreadsheet is destined to Google archives contributing to ever growing collection of docs I created and happily forgot about. So I created this visualization that represents both most of data and the story:


Had I kept more detailed log I would have ended up with more dimensions to use. For example, miles driven by car or train, time spent at leisure versus touring, number of cities and places visited, historical marker attributes and so on. But that moves us further away from small data domain as footprint and dimensions grow and story becomes less comprehensible. Another indicator it that it becomes harder to collect data manually. Instead, there are apps that would do it for me, for example, Life Cycle or Apple Health.

Ultimately any big data problem is reduced to one or more small data ones with aggregations, regressions, clustering or some other data science method. The path to big data insights is a journey from big to small data in search of simple story. So learning how to deal with small data is where it all both ends and begins.