Aiven Blog

Jan 25, 2024

Materialized views in Aiven for ClickHouse® optimize queries for speed and freshness

Execute even complex queries quickly without the risk of stale results. See what makes Aiven for ClickHouse® ideal for analytics wherever your data lives.

Arthur Ansquer, Technical Product Manager

Arthur Ansquer

|RSS Feed

Businesses nowadays strive to make data-driven decisions to ensure that they invest resources wisely as their solutions grow and evolve. The challenge does not lie in gathering the requisite information, but rather in effectively harnessing this abundance of data to discover insights into your products or your customers. This issue is compounded by optimization challenges, where insights come at too high a cost and too slow a pace to make a meaningful impact.

Materialized views (MVs for short) help solve this very issue. Yet, despite their potential, they remain misunderstood or underutilized. By exploring the capabilities of materialized views in Aiven for ClickHouse®, this post will illustrate how we can optimize analytics, reduce costs, and achieve immediate and profound insights into customer behavior.

What are materialized views?

A materialized view is a precomputed result set that stores the output of a query, allowing for faster retrieval of data. Unlike traditional views, which are essentially saved queries that don't store data themselves, materialized views store the query results. This precomputed nature sets materialized views apart from regular views and tables, which require actual computation and data retrieval with each use, making them more resource intensive. Materialized views don’t involve recomputing the results each time they are run, providing a low-cost and quick way to re-run even the most complex query.

The tradeoff for this performance is that materialized views need to be periodically refreshed as the data in the source tables change. This is done either via scheduled, trigger based, or incremental refreshes of the materialized views. By using materialized views strategically, database administrators and developers can strike a balance between query performance and data freshness.

ClickHouse® MVs harness the power of columnar data management

In ClickHouse®, materialized views operate based on the same foundational concept of precomputing and storing query results described above. However, their implementation and benefits are tailored to ClickHouse's specific capabilities: columnar storage ensures higher compression of data and faster query performance compared to traditional row-oriented databases, and the MergeTree family of engines makes the necessary refresh of the materialized views as efficient as possible, consuming the least amount of resources. Computation happens at ingest and merging time instead of query time, making materialized views in ClickHouse even faster than with other DBMS like PostgreSQL or Snowflake.

Moreover, materialized views in ClickHouse allow for automated data transformation, since data from an insert can be automatically inserted into more than one table: ClickHouse’s powerful aggregations and joins capabilities can be stored as a materialized view to speed up queries that require resource-intensive calculations.

Say you are a social media analytics platform that provides insights to businesses about their online presence and customer engagement. You are tracking hundreds of thousands of posts, comments and pieces of content for various customers. Your database grows to tens of millions of rows as you collect interactions, and it becomes increasingly harder to maintain fast query speed as you go. Queries to calculate this past week’s total likes or last thirty days average comments per users are getting slower. By setting up materialized views in ClickHouse based on the time of insertion, you can query only the latest inserted data and report only the most recent likes and comments aggregated across the different social media platforms instead of the whole database. This makes the data retrieval process fast and light on resources, even over large datasets and complex queries.

ClickHouse® MVs are essential to operate at scale

Materialized views in ClickHouse are not only efficient, they are also instrumental to ensuring your query performance doesn't degrade as your data grows. Picture yourself as an e-commerce organization with thousands of products for millions of users. The challenge to keeping on top of your sales analytics is making sure you are taking into account the latest sales you’re making, for the most accurate picture of your activities possible. It is also important to consider all the variables of a sale such as the product, the region of the buyer or the time it was made. Each of those tend to increase the number of columns in your tables, impacting query performance over time and leading your decision making to lag behind.

In order to follow the right course of action in real time you need the robustness of ClickHouse to aggregate all the sales data without worrying about the number of columns. Moreover, materialized views take this a step further by precomputing all the aggregations such as total revenue, average order value, and top-selling products for you to display those metrics in your business intelligence dashboard of choice. You will not have to worry again that the data is up-to-date or fully accurate and can repeat the queries as often as you want without putting additional strain on your database. Your marketing team will be able to identify your top selling articles, your customer success team can view regional influxes in demand as they happen, and you can adjust pricing and inventory as sales evolve.

Keeping data fresh with Aiven for Apache Kafka®

As mentioned previously, the biggest tradeoff when using materialized views is that their data needs to be regularly refreshed to prevent it from going stale. Take the e-commerce example above: using MVs to populate dashboards and provide insights are only as beneficial as they are accurate and up-to-date. So the question becomes, how often should that data be refreshed? This is where Aiven's holistic approach to data management truly shines.

Aiven for ClickHouse® allows you to get up and running quickly, with the ability to start a ClickHouse instance, populate your data, and create MVs in a few minutes. Because Aiven also provides streaming capabilities in Aiven for Apache Kafka®, and seamless integrations between services with just a few clicks in the Aiven console, you can get as close as possible to real time data analytics by integrating Kafka with ClickHouse.

Returning to our e-commerce example, you can use Aiven for Apache Kafka to capture all the status updates on your products and shipments. ClickHouse has a Kafka table engine that can read from Kafka topics directly, and trigger an MV with each event to persist that data and update any associated queries. This ensures you always have fresh data and can make decisions with confidence that they're based on the latest developments available.

Get started with MVs in Aiven for Clickhouse®

Now it's time to put what we've covered to use. The Aiven Docs provide tutorials for loading sample data into ClickHouse, as well as connecting Aiven for Apache Kafka with Aiven for ClickHouse. Once you're comfortable with the basics, be sure to check out this end-to-end example of creating a real-time analytics pipeline with Apache Kafka®, Apache Flink®, and ClickHouse®, all within the Aiven platform.

Related resources

  • Connecting Apache Kafka® and Aiven for ClickHouse® illustration

    Sep 20, 2022

    Learn how to perform analytics with fast response time over huge volumes of data without the need to downsample data.

  • What is ClickHouse®? illustration

    Sep 14, 2022

    Big data can help you reach complex business decisions. Find out how ClickHouse is designed to get to the bottom of big data processes.

  • Aiven for ClickHouse® now generally available illustration

    Dec 13, 2022

    With the general availability of Aiven for ClickHouse®, including a 99.99% availability SLA, you can now create a fully managed data warehouse to run your analytics pipelines and other analytics workloads.