Nov 28, 2025
Replacing cron jobs and dbt pipelines with ClickHouse Refreshable Materialized Views
Alexander Fridriksson
|RSS FeedAlexander does product marketing for databases at Aiven. Throughout his career he's worked across marketing, sales, analytics, data engineering and more.
For ClickHouse to achieve the speed it's known for, it has historically relied on a trigger-based and incremental architecture for materialized views.
This architecture is known to be very efficient but lacks flexibility, such as when working with data that needs to be backfilled rather than append-only.
The Problem with Incremental Materialized Views
ClickHouse Materialized Views have historically functioned like triggers, firing only when new data is inserted into the source table. They then transform that single block of data and append it to the target table. They don’t capture historical data and don’t maintain a link to the source table.
This created three major problems:
- No updates or deletes: If you updated a record in your source table, the Incremental MV would not reflect this change, as it only triggers on new inserts.
- Complex re-calculations: If you changed your aggregation logic, you couldn't just "refresh" the view. You had to drop the view, then create a new one and manually backfill the data.
- Orchestration overhead: To get around this, you had to write external scripts, such as using Airflow, Dagster, or Cron, to periodically drop tables and run
INSERT INTO ... SELECTqueries just to get a clean snapshot of the data.
The Refreshing Solution
The introduction of Refreshable Materialized Views lets you control the update frequency of your views directly in ClickHouse, without the need for external orchestration tools. While this is not a complete replacement for complex orchestration, it can replace many simple cron jobs and even some dbt pipelines.
Replacing cron jobs
The magic lies in the REFRESH clause.
-- Can be used to periodically pull data from S3 CREATE MATERIALIZED VIEW refreshing_summary REFRESH EVERY 1 HOUR TO refreshed_target AS SELECT ... FROM s3('s3://sample-bucket/*.parquet') -- More complex example: -- 20th day of every month at 11:30 pm REFRESH EVERY 2 WEEK OFFSET 19 DAY 23 HOUR 30 MINUTE
When using the REFRESH clause, ClickHouse will automatically re-run the query, replacing the old data with the latest results from the source. Anything you can represent with cron, you can do in ClickHouse, just with a different syntax.
You can also manually refresh, modify, and audit the schedules.
-- Start a manual refresh SYSTEM REFRESH VIEW refreshing_summary; -- Change the refresh frequency ALTER TABLE refreshing_summary MODIFY REFRESH EVERY 45 MINUTE; -- Audit view refreshes SELECT database, view, status, last_success_time, last_refresh_time, next_refresh_time FROM system.view_refreshes;
Replacing dbt pipelines
Taking this a step further, we can use the DEPENDS ON clause to synchronize refreshes of different tables. This is similar to the ref function in dbt.
-- Using the dbt ref function SELECT ... FROM {{ ref('stg_customers') }} -- Using the ClickHouse DEPENDS ON clause CREATE MATERIALIZED VIEW customers REFRESH EVERY 1 DAY DEPENDS ON stg_customers AS SELECT ... FROM stg_customers
If you are familiar with how dbt works, this might seem straightforward, but let’s go through what is actually happening here.
By saying that customers depends on stg_customers ClickHouse will not trigger the refresh of customers until the refresh of stg_customers completes successfully.
If we wanted, we could also chain multiple views to create a dependency graph, or in other words, a Directed Acyclical Graph (DAG) that transforms raw data into something useful.
If the source view fails to refresh, the dependent view will skip its refresh to avoid processing inconsistent or stale data.
ClickHouse does not replace the full power of dbt, but it gives you the ability to choose what kind of complexity you want. By making it simple to create simple cron jobs or pipelines, you can choose if the overhead of a Python framework is worth the extra complexity, or if all you need is a simple REFRESH.
ClickHouse gives you the ability to choose
While Refreshable Materialized Views are a great addition, they do not replace Incremental Materialized Views, which are generally more efficient. ClickHouse now gives you more flexibility to choose what you need.
You can now do both the real-time streaming ClickHouse is known for and the traditional batch workloads you might have run in other data warehouses.
You might even be able to simplify your operations by replacing simple cron jobs and dbt pipelines by allowing ClickHouse to orchestrate them with the REFRESH and DEPENDS ON clauses.
You can try this out today with Aiven for ClickHouse.
Stay updated with Aiven
Subscribe for the latest news and insights on open source, Aiven offerings, and more.



