These models are simple expressions of Big Data and what can be done with it. Baby steps. Data becomes of interest when it can be processed and analyzed. Data at rest, not yet analyzed, is potential value – much like potential energy. [I’m going to skip the example analogy of hydroelectric dam storing water, as I’m sure you can walk through that in your own mind.] Mechanisms must exist that when applied to that data extract value from it. Your money in the bank is data at rest; it is worth something because you can use programs to manipulate the represented quantity in exchanges with other accounts.
Data at rest alone is a cost item. Money must be spent to capture and store it. Datacenters are capital and energy intensive. Data must be curated. In our industry, data frequently must be cleaned up. Data must be shared. Data must be transported from place to place over expensive data networks. Regulatory and privacy issues must be addressed. This is a long and ongoing list.
It is possible to think of the changes in information, for a specific grouping of things, over time as behavior. In most cases, it is not the data itself, which is valuable; it is the behavior that can be extrapolated from the data. Data is valuable when, and only when, there are defined models that bring perspective to that data and allow behavior to be extrapolated from it. Better models make data more valuable. A simple example: Having stored data that one out of three of your customer base has blue eyes is meaningless unless the characteristic of blue eyes is associated with your product or service. But knowing that a customer who watched “Big Bang Theory” this week will likely watch programs X, Y & Z later in the week, can allow for targeted reinforcement adds. You extrapolate this by analyzing the collected data on what is watched via a specific model.
The bottom line is that Big Data is a complex field that needs specific expertise to manipulate and extract value. It is not amenable to buying a product, deploying it, and applying it in the business. Our normal products already embed a model about the data they collect and manipulate. For the limited range of data they address, this worked as a way to encapsulate a priori static models of information about what data was valuable and what a business could do with that information. Big Data is fundamentally different. Big Data is about understanding relationships in the data and extracting information at a fundamental level. It is about finding new models. Data scientists use sophisticated mathematical approaches to find what is commonly called "hidden value". That hidden value is actually the bringing forth of new models of behavior from data analysis.
ICE companies must hire data scientists to gain value from Big Data. But not enough data scientists are being trained and graduated. McKinsey estimates a 50 to 60 percent gap between demand and the current production of talent in analytics, statistics and machine learning. The first and most direct way of expanding this pool of candidate employees is active partnerships with universities. It’s time to bring back internships, scholarships, and graduate sponsorships.
It is possible to view the universe as a soup of information. We understand and manipulate matter, energy, and space by adjusting parameters used to measure these. Physics encapsulates the mathematics that accomplishes this. But since our experience of these forces and the characterization of them is data, one can extrapolate that the universe is information. Data is the characterization of information at an instant of measurement. But only that immediately measured, small part is captured. For the most abstract thinkers, we are just clumps of information within a soup of information. Further, inverting from that characterization of data, an instant instance of continuous information, then the full description of all information in any data set is a mathematical statement; the full description of everything is a mathematical essay.