Aiven Blog

Jan 10, 2024

How AI can change the game in the database and streaming system optimization field

The net result of an AI-powered optimization process is not only a better data experience, but also increased developer productivity. Once the teams are aware of what can be automated, the improvements can not only speed things up but reduce costs.

Jonah Kowall, VP Product Management

Jonah Kowall

|RSS Feed

VP Product Management

Optimizing databases has been a constant struggle from the early days of DBAs working on mainframes, through the transition to distributed systems, and continues to be a struggle today. The engineers of today manage a numerous and diverse set of data systems and data architectures. As cloud platforms were introduced, the diversity exploded along with heavy usage of data streaming technologies. The challenge of managing this diversity became insurmountable as it quickly outpaced what a single person or even a team of experts could manage. 

The current state of optimization

Today, the availability of tools which suggest improvements or optimizations is limited. The choice is between a few commercial offerings, some rather stagnant open source tools, and several extensions to popular databases platforms which offer a diverse set of optimization capabilities. When it comes to optimization, there are two methods that can be adopted: automated or non-automated.

On the non-automated side, most of the databases provide basic tooling for programmers to build performance visibility and (possibly) optimization automation. These foundational tools are included in commercial offerings or part of the community for open source data technologies. Such tools enable profiling, monitoring, vacuuming, and being able to tweak the configuration of the runtime itself. The most innovative technologies out there also provide indications as to how you might optimize a query or when indexes should be added. These are largely centered around relational databases and require an understanding of database internals. 

On the automated side, there are open source extensions and tools which autonomously build indexes to optimize performance, along with capturing metrics to drive and test the automation. There is not much else which is automated to help unless you enlist commercial software. There are companies in that space who will automate selecting of indexes, including the [EverSQL by Aiven] (https://eversql.com/) technology, which we are building into the platform. The other aspect is the configuration of the given system. These tools look at the settings on a specific database, but do not tend to look at the operational nature of the database workload. Tuning configurations can have a big impact on performance. Keep in mind, all of these techniques are only focused on today’s distributed relational databases. 

Applying new techniques to solve this for streaming systems

In the future, we will have a broad application of AI to solve the same problems with relational databases. But the bigger dataset today is flowing through streaming systems, such as the popular Apache Kafka solution, which have complex needs to tune and optimize. Examples of Kafka tuning include automated partitioning of data across a cluster of Kafka servers to ensure the workload is balanced, applying configuration settings to Kafka  topics to size the right retention, and automating the use of tiered storage. At an even lower level, we can change the segment sizes and compression of the topics to further seek performance gains from a given sized Kafka cluster. 

Outside of Kafka, there are immense sets of data in non-relational databases. One example we see in the community is the opportunity to improve OpenSearch performance. There is currently a lack of products aimed at optimizing OpenSearch; however, some enhancement in commercial software are driving performance improvements by optimizing mappings and the usage of templates. Some settings on specific indices can be set on optimal refresh intervals, number of shards, and replicas. If you are on Aiven, optimization can be taken a lot further. One possible solution for write back pressure on OpenSearch clusters is to use Kafka as a buffer. Kafka can queue the incoming data and reduce the load on OpenSearch. This way, we can avoid paying for bigger OpenSearch clusters and save money.

Once the work is done to optimize the infrastructure, we can dedicate some focus on tuning the queries which are impacting the data stores. We can learn from the lessons in the relational database space to drive optimization in other spaces like OpenSearch. By adjusting the caching parameters or being able to provide users with filters suggestions, we can further enhance the query performance. Furthermore, based on the workload specifics, we can start tuning low level settings in the OpenSearch JVM like the heap size. Finally, we can apply best practices to indices, including doc values and _sourceid settings, which can improve query performance based on the data structure on disk.

How AI can help data infrastructure optimization

Most of the solutions so far on the market are human labor based: data experts monitoring the cluster workloads health and tuning knobs based on their observations. However, this approach is not scalable. With more and more clusters to monitor, and with the increasing variety of technologies involved in a data project, relying on humans to monitor, understand anomalies and address them manually is not future-proof or affordable. Just a sick day of an expert could generate slowness, downtime, and related financial impacts.

This is where AI can help. By automatically monitoring data infrastructure clusters, identifying anomalies and providing suggestions about best practices to implement, we can reduce the burden on humans, optimize their productivity and enhance the overall data infrastructure performance and cost. Moreover, by pushing performance optimization suggestions to developers and data professionals, we can enhance the data literacy and education of the entire workforce.

The results of optimization

The net result of the AI-driven optimization process is not only a better data experience, but also increased developer productivity and education. Once the teams are aware of what is possible and what can be automated, the improvements can not only speed things up but reduce costs as wasted infrastructure can be removed or repurposed. Here at Aiven our goal is to help our customers rightsize their plans, directly impacting their costs and improving the bottom line. We look forward to the future when we can embed our AI-powered optimization technologies across the product portfolio of leading open source databases and data streaming technologies.


Related resources