By: Nancee Ruzicka
Because the volume of data currently available to service providers, not to mention the variety of sources, exceeds humans’ capacity to correlate and analyze all of it in a timely manner, the concepts of Big Data are becoming very real. The challenges surrounding it are many, but the most significant ones for companies involve, not surprisingly, scalability and deciding what exactly they want to learn from all that data. The technology that can search and sort through massive amounts of data is evolving, but the supporting cast is less capable than it needs to be, and none of the information means anything if we don’t ask the right questions.
If you recall the children’s game of “telephone,” in which a secret message is whispered into the ear of another person, then another, and so on down the line until the last person is asked to reveal what he or she heard, the message that’s received seldom resembles the one that was sent. Applying analytics to Big Data requires data scientists and expert modelers to carefully craft the series of inquiries and correlations that find the data as well as detect patterns and analyze results in order to ultimately answer questions posed by marketing, product management or any other work group—but, as in the telephone game, the answers at the end of the line may not be accurate or even useful.
Analysts conduct a lot of surveys, and much like a data warehouse, respondents answer the questions that are presented to them. The difference is that in a live interview I can explain a question and provide context to help the interviewee understand my research and deliver a valid response. But if a survey is conducted online, each respondent is left to his or her own interpretation of a question, and though the respondent’s answer may be truthful, it may not reflect the original intent of my query.Whether conducting a survey or searching a data warehouse, it’s important to properly frame questions and provide context that prevents misunderstanding. Questions that are too broad or too vague won’t lead to revealing answers. Questions that are too involved or have too many qualifiers may not deliver accurate results. And sometimes your questions just won’t make any sense to a data modeler.
To be of value, questions must be specific to a business user, and although data scientists and data modelers are on board to develop models and configure tools, that doesn’t mean they can identify every type and source of data that is necessary to get the job done. Data scientists and modelers know how to analyze structured and unstructured data from multiple sources, but they don’t necessarily know what they’re looking for or where to look for it; keywords aren’t always enough to detect that elusive gold nugget. It takes collaboration between the data modeler and the business user who’s trying to answer a specific question to determine which structured and unstructured data are important and in what context.