Operational applications—and by extension, the data that they create—are very different. They are highly interconnected and built to get every ounce of data from the transaction taking place. All these connections create a massive mess of data that is spread across multiple complex applications that must run in real time. Rather than querying a clean, straightforward, well-organized table, analyzing a single financial transaction might require data from upwards of 50 distinct tables in the backend ERP database, often with multiple lookups and calculations.
In these circumstances, asking a question that needs to cover tens or hundreds of tables and relationships takes a lot of time. It means writing more complex queries to try to get the results. And it means waiting for the results to come back, during which time a user cannot make that informed decision. Consequently, analytics programs will concentrate on specific questions that users know they need the answer to, rather than looking at experimentation.
With this approach, answering more complex questions will involve creating increasingly complex queries across your pipeline, which takes more time. This involves engineering your data pipelines to route data into increasingly simplified business views that are linked up with each other. This makes it possible to run those queries, but it involves more work to simplify and reshape the data. In effect, those questions are formulaic and can’t respond to the real needs that operational staff have.
Asking a new question means going back to the source system and building the pipeline to answer the query. This is time-consuming and expensive, and it does not encourage any forms of experimentation. Any changes to the component applications like ERP or finance will break the pipeline completely.
To solve this problem around operational data, we must design our approach around connected data from the start. In practice, this means addressing the fundamental problems that operational staff want help with around data, and then making that data accessible to them.
With a data pipeline model, any kind of complex analysis will affect the applications that create that data in the first place. Instead, we must look at how to optimize the process when data is imported before any queries are made. Gartner has termed this approach analytics query acceleration (AQA) and it provides a useful shortcut to help companies make data more available to their teams.
AQA takes a different approach to the pipeline model and scans the entire data set that the company has. Before any query is run, the data set has been prepared to take any connections into account. Because it does not reformat the data to fit a pipeline, you can ask the questions that you want to rather than being limited to what the pipeline has been designed for in advance. This approach also allows you to run those queries on the full volume of raw business data that is available for exploration, rather than relying on updates to the data set that are processed in batches.
Running queries on raw data means that you can ignore the need to extract, transform and aggregate data to make it useful to your teams. At the same time, leaving data as close to its native format as possible means that it is easier to explore that data and experiment, or just ask more questions. By acting on a digital twin of the real data, rather than a heavily modified version of that information, you can get a better understanding of how those results were generated. This is especially useful if you want to provide more self-service data analytics to your business users and want to look at how they work.
Running a data pipeline is a fundamental assumption for many data analytics projects. We expect to have to implement a pipeline to acquire, process, and analyze data. It is useful for data warehousing and specific projects around data science. But it is not the right approach for everything. With operational data, users need to look at information to answer their questions in the moment, and where questions might not be defined from the start.
By applying the right approach, operational analytics helps front-line staff to ask the questions that they want answers to. These can change in real time based on the data that is coming in. This flexibility is important, as it opens more opportunities. If you can run one query and it takes hours to run, you tend to stick to specific questions. However, when staff can run multiple queries on data that is as up to date as possible, and get results back in minutes or seconds, then the volume and type of questions will go up massively. This can lead to more insights and—more importantly—better decisions.
To make the most of this change, a different mindset is needed. For example, we must open up around operational data, so that it can be queried at the time that it is needed. Alongside this, we can use that data in more experimental ways. For line of business teams, help is needed to explain how to make the best use of data to answer questions. Based on AQAs, teams can improve their operational decisions over time. There are multiple types of data that users need to engage with, and the tools used to analyze one type of data are not necessarily right for the others. That’s why thinking through the goals, objectives and overall approach is necessary to succeed.