4 Maturity Levels of Data Analytics
Quite recently, I was asked how to address the issue of a Kubernetes cluster struggling to scale quickly enough to meet sudden surges in demand, which occur from time to time. This brought to my mind several usual suspects in such cases, including but not limited to incorrect autoscaler configurations, insufficient resources on existing nodes, network limitations, control plane throttling, even cloud service provider quotas and rate limits. However, I didn’t want to bombard with all these possibilities without any kind of critical thinking. Instead, I suggested framing the issue through the lens of data.
“Considering the issue as a ‘data question’ shifts the approach from reactive troubleshooting to a more systematic and measurable proactive investigation”.
Viewing the issue as a ‘data question’ transforms the approach from a reactive troubleshooting process — where symptoms are addressed as they arise without fully understanding their underlying causes — to a systematic and structured proactive investigation. This means relying on measurable evidence, such as logs, metrics, traces, and historical trends, to identify patterns, diagnose root causes, and guide decision-making. The focus shifts to analysing data to form hypotheses, validating them through observations, and implementing targeted solutions that can be tested and tracked for effectiveness rather than depending on trial-and-error troubleshooting. This is where data analytics comes into play.
Data analytics, as an overarching term, includes data analysis as a key subcomponent. However, it goes beyond analysis by emphasising the extraction of actionable insights, often leveraging advanced techniques such as predictive modelling and machine learning. Those who are familiar with the field will recognise that it is categorised into four maturity levels, representing a progression from understanding data to driving data-informed actions.
“While Data Analysis focusses on exploring and understanding data, Data Analytics involves using tools and techniques to derive insights, often with a forward-looking approach”.
1) Descriptive Analytics: This is the foundational maturity level that focuses on analysing historical data to understand events, answering the question: ‘What happened?’. It serves as the basis for more advanced analytics. Common tools and techniques at this level include data aggregation, summarisation, visualisation, and basic statistical analysis.
Returning to the original Kubernetes issue, the goal should be to find answers to the following questions within the collected data. This lays the groundwork for moving into diagnostic analytics, where you explore why these problems occurred.
- What was the timeline of the issue?
- What specific events were triggered during the scaling process?
- What were the resource utilisation levels?
- What was the status of the pods?
- What scaling events were logged?
- What was the network performance?
- What patterns can be observed in historical scaling events?
2) Diagnostic Analytics: Building on descriptive analytics, this level examines data to identify the root causes of events or trends, answering the question: ‘Why did it happen?’. It provides insight into causation, helping to understand relationships and patterns within the data. Drill-down analysis, correlation analysis, time series analysis, and statistical modelling are some of the widely used methods.
Below are some diagnostic questions tailored to our Kubernetes use case. Answering them will help identifying the root causes of the scalability problem, such as misconfigurations, infrastructure constraints, or network bottlenecks. This sets the stage for actionable improvements, which can then be guided by predictive and prescriptive analytics.
- Why did the autoscaler fail to scale quickly enough?
- Why did the infrastructure fail to respond in time?
- Why was the control plane slow or unresponsive?
- Why were network-related delays observed?
- Why were certain configurations ineffective?
- Why were similar surges handled better (or worse) in the past?
- Why did any alerts or monitoring fail to notify about the issue earlier?
3. Predictive Analytics: This maturity level takes diagnostic analytics one step further, leveraging patterns and trends within historical data to predict future outcomes, answering the question: ‘What will happen?’. It enables to anticipate and prepare for future events using techniques such as machine learning and time-series forecasting. However, since predictions and forecasting are involved, it’s important to understand that this is an iterative process that improves over time, albeit with an inherent error rate.
To anticipate future issues related to our Kubernetes problem, we should ideally seek answers to the following questions. This will pave the way for prescriptive analytics, which will recommend actionable solutions to optimise scaling and prevent problems before they occur.
- What are the predicted traffic patterns over time?
- What workloads are likely to grow or shrink?
- How quickly will the cluster scale under different demand scenarios?
- What are the probabilities of resource exhaustion or quota breaches?
- What early warning signs predict scaling issues?
4. Prescriptive Analytics: The most advanced level of maturity, prescriptive analytics aims to provide actionable recommendations or informed decisions based on predictive insights, answering the question: ‘What should we do?’. It leverages findings from earlier analytics stages to suggest the best course of action. In essence, it determines what steps should be taken that we understand what is likely to happen in the future.
Regarding our specific Kubernetes scalability use case, relevant prescriptive analytics questions could include fine-tuning autoscaler configurations, adjusting resource management strategies, optimising infrastructure, and implementing proactive monitoring and alerts, as outlined below.
- What are the optimal autoscaler settings to handle surges better?
- What node pool configurations will improve scalability?
- How can we prevent resource bottlenecks?
- What changes to workload distribution can improve performance?
- How can we address cloud service provider constraints?
- What tools or features should we adopt for better scalability?
- What monitoring and alerting thresholds should be implemented?
- How can we automate incident response?
Discussion
In this particular use case, my systematic and structured approach greatly helped me think through the issue in depth. Asking those questions (and many others not included here) reduced ambiguity and clarified the missing pieces of the puzzle. However, I wouldn’t say it’s always the best approach to follow. For many problems, going through all these stages can be overkill, especially given the level of complexity involved in both the problem and its associated solution.
You must strategically evaluate how much value can be derived relative to the time and resources required to accomplish each maturity level. It’s obvious that value comes with a price. Each maturity level builds upon the previous one, demanding more advanced techniques and tools. While the ultimate goal may be to achieve prescriptive analytics to proactively influence actions with data-driven insights, descriptive and diagnostic analytics will often be sufficient to meet the requirements for many real-world use cases. Avoid going down the rabbit hole unnecessarily, just for the sake of appearing sophisticated. Simplicity is often the most effective path to solving a problem.
Additionally, don’t overlook the value of leveraging off-the-shelf tools and features where appropriate. They are often designed to address common challenges and can save you significant time and effort in solving your problem. Instead of reinventing the wheel or diving into overly complex solutions, take the time to explore what’s already available. Sometimes, the answer you’re looking for might already be there, waiting for you to just uncover it. A simple configuration change, an unexplored feature, or an existing tool might hold the key to solving your issue faster and more efficiently than anticipated. Embracing these options can not only save time but also improve reliability by taking advantage of proven, well-tested approaches.
Conclusion
Data analytics transforms raw data into meaningful, actionable insights, enabling us to solve real-world problems. In this post, I demonstrated how a common tech issue can be approached through the lens of data. By leveraging the benefits of framing the issue as a ‘data problem’, I introduced a widely-recognised model that represents the four maturity levels of data analytics. This framework helps in understanding the issue in depth and formulating a solution that suggests the best course of action.
Consider adopting a similar approach when dealing with a highly ambiguous and complex problem that can be supported by significant amounts of high-quality data, especially when a reactive process would not be sufficient. And feel free to share your thoughts in the comments below!