Create dictionaries in Aiven for ClickHouse®
Create dictionaries in Aiven for ClickHouse® to accelerate queries for better efficiency and performance.
Dictionaries in Aiven for ClickHouse
A dictionary is a key-attribute mapping useful for low latency lookup queries, when often looking up attributes for a particular key. Dictionary data resides fully in memory, which is why using a dictionary in JOINs is often much faster than using a MergeTree table. Dictionaries can be an efficient replacement for regular tables in your JOIN clauses.
Aiven for ClickHouse supports backup and restore for dictionaries. Also, dictionaries in Aiven for ClickHouse are automatically replicated to all service nodes.
Read more on dictionaries in the upstream ClickHouse documentation.
Prerequisites
- Aiven for ClickHouse service created
- SQL client installed
- Dictionary source available
- Credentials integration for remote ClickHouse, PostgreSQL®, or MySQL® if to be used as sources
Limitations
-
Only TLS connections supported
-
If no host is specified in a dictionary with a ClickHouse source, the local host is assumed, and the dictionary is filled with data from a query against the local ClickHouse, for example:
-- users table
CREATE TABLE default.users
(
id UInt64,
username String,
email String,
country String
)
ENGINE = MergeTree()
ORDER BY id;
CREATE DICTIONARY default.users_dictionary
(
id UInt64,
username String,
email String,
country String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(DB 'default' TABLE 'users'))
LAYOUT(FLAT())
LIFETIME(100);In Aiven for ClickHouse, to fill the dictionary the table users are queried with the permissions of the
avnadmin
user even if another user creates the dictionary. In upstream ClickHouse, the same is true except thedefault
user is used. -
In Aiven for ClickHouse, setting dictionaries_lazy_load is set to
true
, which means that errors with dictionary source parameters may only become apparent when the dictionary is loaded on the first use, rather than when it is created.
Supported layouts
Aiven for ClickHouse supports the same
layouts that the upstream ClickHouse supports
with two exceptions,ssd_cache
and complex_key_ssd_cache
, which are not supported.
Supported sources
- HTTP(s)
- Remote ClickHouse
- Aiven for ClickHouse
- Remote MySQL®
- Aiven for MySQL
- Remote PostgreSQL®
- Aiven for PostgreSQL
Create a dictionary
To create a dictionary with specified structure (attributes), source, layout, and lifetime, use the following syntax:
CREATE [OR REPLACE] DICTIONARY [IF NOT EXISTS] [db.]dictionary_name
(
key1 type1 [DEFAULT|EXPRESSION expr1] [IS_OBJECT_ID],
key2 type2 [DEFAULT|EXPRESSION expr2],
attr1 type2 [DEFAULT|EXPRESSION expr3] [HIERARCHICAL|INJECTIVE],
attr2 type2 [DEFAULT|EXPRESSION expr4] [HIERARCHICAL|INJECTIVE]
)
PRIMARY KEY key1, key2
SOURCE(SOURCE_NAME([param1 value1 ... paramN valueN]))
LAYOUT(LAYOUT_NAME([param_name param_value]))
LIFETIME({MIN min_val MAX max_val | max_val})
SETTINGS(setting_name = setting_value, setting_name = setting_value, ...)
COMMENT 'Comment'
Examples
Speeding up JOIN
s
-
Create tables in your ClickHouse database:
CREATE TABLE users
(
id UInt64,
username String,
email String,
country String
)
ENGINE = MergeTree()
ORDER BY id;CREATE TABLE transactions
(
id UInt64,
user_id UInt64,
product_id UInt64,
quantity Float64,
price Float64
)
ENGINE = MergeTree()
ORDER BY id; -
Create a dictionary for the
users
table:CREATE DICTIONARY users_dictionary
(
id UInt64,
username String,
email String,
country String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(DB 'default' TABLE 'users'))
LAYOUT(FLAT())
LIFETIME(100);You can do the same using the
QUERY
parameter:CREATE DICTIONARY users_dictionary
(
id UInt64,
username String,
email String,
country String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(QUERY 'SELECT id, username, email, country FROM default.users'))
LAYOUT(FLAT())
LIFETIME(100);
JOIN
s are much faster as the data is pre-indexed in memory.
SELECT
t.id,
u.username,
t.product_id,
t.quantity,
t.price
FROM transactions AS t
ANY LEFT JOIN users_dictionary AS u
ON t.user_id = u.id;
Caching data from an external database or URL
-
Create a dictionary for the
pricing
table in your MySQL database using a composite key:CREATE DICTIONARY product_pricing
(
product_id UInt64,
region String,
price Float64 DEFAULT 0.0
)
PRIMARY KEY product_id, region_id
SOURCE(MYSQL(NAME mysql_named_collection DB 'product_db' TABLE 'pricing'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 600 MAX 900);This will periodically query MySQL and store the data in memory.
-
Create a dictionary for the
pricing
table in your PostgreSQL database using theFLAT
layout:CREATE DICTIONARY product_pricing
(
product_id UInt64,
price Float64 DEFAULT 0.0
)
PRIMARY KEY product_id
SOURCE(POSTGRESQL(NAME psql_named_collection DB 'product_db' SCHEMA 'schema' TABLE 'pricing'))
LAYOUT(FLAT())
LIFETIME(0);Because
LIFETIME
is0
, it has to be manually refreshed as follows:SYSTEM RELOAD DICTIONARY product_pricing;
-
Create a dictionary with
HTTP
as a source:CREATE DICTIONARY currency_rates
(
currency_code String,
rate Float64 DEFAULT 1.0
)
PRIMARY KEY currency_code
SOURCE(HTTP(URL 'https://example.com/currency_rates.csv' FORMAT CSV))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(100); -
Create a dictionary for the
users
table in a remote ClickHouse database using theFLAT
layout:CREATE DICTIONARY users_dictionary_remote
(
id UInt64,
username String,
email String,
country String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(NAME remote_clickhouse_named_collection DB 'default' TABLE 'users'))
LAYOUT(FLAT())
LIFETIME(100);