Aiven Blog

Aiven for ClickHouse 25.8 LTS: Vector Search GA, Projections, Correlated Subqueries, and Faster Queries

Vector Search GA & SQL Enhancements.

Sergii Smyrnov

|RSS Feed

Senior Engineering Manager

Aiven for ClickHouse 25.8 is now available as an Early Availability. This Long-Term Support release introduces lightweight projections as secondary indexes, general availability of vector search with binary quantization, correlated subqueries for broader SQL compatibility, lightweight updates for MergeTree tables, and significant performance and data lakehouse improvements.

Customers on Aiven for ClickHouse 25.3 can upgrade to 25.8 through the Aiven Console. Note that downgrades from 25.8 back to 25.3 are not supported, so we recommend forking your service before upgrading. A fork creates a full copy of your service that you can use to validate the new version against your workloads before committing to the upgrade on your production service.

Here's what matters for your workloads.

Vector Search Reaches General Availability

Approximate nearest-neighbor search using the HNSW index is now GA. Version 25.8 adds binary quantization for the vector similarity index, which drastically reduces memory consumption and improves indexing speed — a critical improvement for large-scale embedding workloads in RAG pipelines, recommendation systems, and semantic search.

Projections as Lightweight Secondary Indexes

ClickHouse has historically been limited to a single primary index per table. Projections now function as lightweight secondary indexes — they store only the sorting key and a pointer back to the base table, without duplicating the full dataset.

This enables granule-level pruning across multiple index paths simultaneously.

If your queries frequently filter on columns outside the primary key, projections eliminate the need for full scans without the storage overhead of materialized views.

Consider a typical web analytics table ordered by (tenant_id, timestamp):

Loading code...

The primary index handles the most common query pattern — filtering by tenant and time range. But when a support engineer needs to look up all events for a specific user, or a product team queries by country, those filters don't align with the primary key.

Without projections, ClickHouse scans every granule in matching parts. With the lightweight projections above, it narrows the scan to only the granules that contain the target user_id or country — without storing a second copy of the data.

Loading code...

ClickHouse combines the primary index (to prune by tenant_id) with the by_user projection (to prune by user_id) in a single pass, examining only the granules that satisfy both conditions.

Lightweight Updates (Beta)

ClickHouse 25.8 introduces UPDATE ... SET ... WHERE directly on MergeTree tables using a new patch parts mechanism. This is a long-awaited capability — historically, ClickHouse mutations rewrote entire data parts, making point updates expensive and slow.

Lightweight updates write small patch parts that are merged asynchronously, making single-row or small-batch updates practical for the first time. This is particularly useful for correcting records, updating statuses, or maintaining slowly changing dimensions without the overhead of full mutations.

Correlated Subqueries: Standard SQL Compatibility

ClickHouse 25.8 adds support for correlated subqueries, closing one of the most significant SQL compatibility gaps. This is a high-impact change for teams migrating from PostgreSQL, BigQuery, or Snowflake — queries that previously required rewriting with JOINs or arrays now work as-is.

This dramatically reduces the effort needed to port existing SQL workloads to ClickHouse.

For example, finding customers whose latest order exceeds their personal average is a single query in standard SQL:

Loading code...

Before 25.8, this required restructuring with JOINs or window functions:

Loading code...

Both produce the same result, but the correlated subquery version is what most SQL developers write first — and what existing applications already use. Not needing to rewrite these patterns removes a real barrier to migration.

Additionally, 25.8 adds lag and lead window functions, another staple of standard SQL that was previously missing. Combined with correlated subqueries, this significantly reduces the rewrite surface when migrating analytical queries from other databases.

Performance Improvements

Parquet Reader v3 with Filter Pushdown

The new Parquet reader, enabled by default, delivers significant speedups when querying external Parquet files on S3, GCS, and Azure Blob Storage. Filter pushdown means ClickHouse reads only the relevant row groups and columns, reducing both I/O and processing time.

Faster Azure Client

The Azure Blob Storage client has been optimized for lower latency and higher throughput, benefiting all Azure-based external table queries and data lake operations.

Lazy Materialization

ClickHouse now defers reading columns until after ORDER BY and LIMIT are applied. For wide tables where queries select many columns but return few rows, this avoids reading column data that will be discarded — a meaningful speedup for exploratory queries and paginated results.

Query Condition Cache

A new caching layer temporarily stores query conditions, effectively creating in-memory indexes for repeated queries with similar filter patterns. This is particularly useful for dashboard workloads where users repeatedly query the same dimensions with slight variations.

Data Lakehouse Integration: Iceberg and Delta Lake Improvements

ClickHouse 25.8 brings substantial improvements to the Iceberg and Delta Lake table engines.

Apache Iceberg

  • Position and equality deletes — ClickHouse now correctly handles row-level deletes in Iceberg tables, with automatic merge of position deletes into data files
  • Write supportCREATE, ALTER, ADD COLUMN, and DROP COLUMN operations work directly against Iceberg tables
  • Stats-based pruning — Iceberg metadata statistics are now used for partition and file pruning, reducing scan overhead
  • Read-in-order optimization — When Iceberg tables define a sort order, ClickHouse leverages it for ordered reads

Delta Lake

  • Write support — ClickHouse can now write to Delta Lake tables
  • Snapshot version reads — Query specific Delta table versions for reproducible analytics
  • AWS session token authentication — Supports temporary credentials for secure S3 access

Hive Partitioning

Read and write support for Hive-style partitioning is now enabled more broadly, simplifying integration with existing data lake layouts on S3, GCS, and Azure Blob Storage.

Breaking Changes to Watch For

Before upgrading, review these backward-incompatible changes that may affect your workloads:

  • annoy and usearch vector index types have been removed. If your tables use these legacy index types, you must drop them before upgrading to 25.8. The GA vector similarity index (HNSW) is the replacement.
  • geoToH3() argument order changed from (lon, lat, res) to (lat, lon, res). Any queries using this function must be updated.
  • 64-bit integers are no longer quoted in JSON output formats in upstream ClickHouse. Aiven sets output_format_json_quote_64bit_integers = 1 in the default profile, so existing behavior is preserved — your JSON consumers will continue to receive quoted integers. If you want to opt into the new upstream default, set this to 0 in your query or user settings.
  • countMatches() behavior changed for zero-length patterns. Patterns with *, ?, or lookaheads that can match zero-length strings now skip past empty matches and continue searching instead of stopping at the first empty match. This produces more results than before. If your queries rely on countMatches() with such patterns, verify the output after upgrading.
  • Nested column identifier resolution changed. With the new default for analyzer_compatibility_allow_compound_identifiers_in_unflatten_nested, SELECT x FROM tab on tables with nested columns like x.a, x.b.first now includes all subcolumns (not just single-part suffixes). Compound identifiers like y.b.first also work after ARRAY JOIN. If your queries depend on the previous narrower resolution, set this to 0.
  • Parquet Enum encoding changed. Enum values are now written as BYTE_ARRAY with ENUM logical type instead of plain integers (output_format_parquet_enum_as_byte_array). This is more standard and preserves string values, but downstream Parquet consumers that expect integer-encoded enums may need adjustment.

What This Means for Aiven Customers

This upgrade is available now as an Early Availability across all Aiven for ClickHouse plans. Existing services on 25.3 can be upgraded through the Aiven Console. Remember to fork your service first — downgrades are not supported.

Key takeaways:

  • Migration teams: Correlated subqueries remove a major rewrite burden when moving from PostgreSQL, BigQuery, or Snowflake
  • AI/ML teams: GA vector search with binary quantization makes ClickHouse a viable vector database for production embedding workloads
  • Performance-sensitive workloads: The new Parquet reader, query condition cache, and projection-based secondary indexes deliver measurable improvements without schema changes
  • Data lakehouse users: Iceberg and Delta Lake beta support means you can start evaluating ClickHouse as a query layer over your existing data lake without moving data

Get Started


Stay updated with Aiven

Subscribe for the latest news and insights on open source, Aiven offerings, and more.

Related resources