By: Jesse Cryderman
Big data has been with us for years, it just hasn't always been referred to as such. Back in 2008, when smartphones were just beginning their meteoric rise in popularity, Google debuted a big data tool that was heralded as a poster child for technology: Google Flu Trends (GFT). The tool tracked 45 flu-related search terms over billions of searches, monitoring trends and making correlations to predict flu outbreaks and severity. Improving healthcare with smart number crunching--what's not to love? Well, a recent paper in Science pointed out a rather large un-lovable: GFT is nearly always wrong, and often by more than 50%.
According to the paper âGFT overestimated the prevalence of flu in the 2012â2013 season and overshot the actual level in 2011â2012 by more than 50%. From 21 August 2011 to 1 September 2013, GFT reported overly high flu prevalence 100 out of 108 weeks.â
The authors of the paper, proponents of big data solutions, did not approach the research with an agenda. In fact, the researchers found that simply using the recent trend of C.D.C. reports from doctors on influenza-like illness, which lag by two weeks, would have been a more accurate predictor than Google Flu Trends.
This highlights a trend some have called the "hubris of big data."
Pipeline continues its legacy of bringing together the worldâs leading service providers and technology innovators this fall at The 2014 COMET Executive Summit. This exclusive event
gathering Pipeline journalists, Industry Advisory Board (IAB) Members, and key solution providers will be an intimate symposium to shape the editorial direction of Pipeline, gather
priceless input from executive-level service provider experts, and create lasting industry relationships.
Pipelineâs IAB is an exclusive group of service provider and analyst executives who have long-term relationships with Pipeline and have played a role in Pipeline programs, editorial
direction, and provided content over the last decade. This year, Pipeline opens the doors to provide an opportunity to engage directly with a broad cross section of experts who
evaluate, recommend, and purchase communications and entertainment technology (COMET) products and services. Multiple levels of participation provide your company with an exclusive
networking opportunity, tailored to your goals and budget.
The COMET Executive Summit will bring together executives from the worldâs leading service provider and technology companies, in a flexible format that is filled with unprecedented
networking opportunities designed to build relationships that can be carried forward to solve issues facing service providers today. Some of the topics planned for discussion
include:
For more information, visit
www.pipelinepub.com/info/comet/2014_comet_summit.php
The ability to capture mountains of data in real time doesn't immediately translate into value or predictive power; in fact, sometimes it translates into excessive cost and incorrect guidance. Here are some common pitfalls, and some lessons that can be learned.
In the case of GFT, a major problem was a reliance on one source--Google searches--as the foundation for analysis. âThe mash-up is the way to go,â Mr. Lazer said. His analysis shows that combining Google Flu Trends with C.D.C. data, and applying a few tweaking techniques, works best.
Effectively collecting, processing, and embedding both structured and unstructured data into daily operations is key to accelerating business. Accuracy and availability are critical; a decision based on incorrect, incomplete, or missing information can put your business at risk. GFT used one data source: user searches. Google search algorithms change depending on the person, the advertising, etc., so using that tool as the baseline for data collection didnât reflect reality.
Another major issue with GFT is the way data was shoe-horned into categories, without enough consideration for causal relationships. âThey overfit the data. They had fifty million search terms, and they found some that happened to fit the frequency of the âfluâ over the preceding decade or so, but really they were getting idiosyncratic terms that were peaking in the winter at the time the âfluâ peaks ⌠but wasnât driven by the fact that people were actually sick with the âfluâ,â Lazer says.
Myth 1: You need all of the data
The foundations of statistical analysis did not change when hard drives became cheaper and more voluminous. The ability to store all data from all sources is costly and counter intuitive. Fractional data sampling is nearly always as accurate as wholesale collection, although many vendors don't want you to believe this.