Glossary

Explore definitions on various key terms and concepts

A-D

Apache Cassandra^®

Apache Cassandra is an open-source distributed, wide column, NoSQL database management system designed for handling large amounts of data across multiple servers.

Resources:

An introduction to Apache Cassandra®

Get started

Apache Flink^®

Apache Flink is an open-source stream processing framework for big data processing and analytics. It supports event time processing and provides high-throughput, low-latency data processing.

Resources:

An introduction to Apache Flink®

Get to know the Aiven API with Postman

Apache Kafka^®

Apache Kafka is an open-source event streaming platform used for building real-time data pipelines and streaming applications. It facilitates the processing of large-scale, real-time data feeds.

Resources:

What is Apache Kafka®?

Apache Kafka® simply explained

Docs: Apache Kafka® concepts

BYOC

Bring Your Own Cloud (BYOC) is a deployment model where managed data services are deployed directly to the customer's own cloud account. This means that customers can enjoy the managed service experience, while all compute, storage and networking infrastructure services - and associated costs - remain under their direct control.

Resources:

What is BYOC?

Optimize your cloud data infrastructure spend

Webinar: Achieve lower TCO and keep full control of your data

Docs: About BYOC

Caching

Caching is a technique used to store frequently accessed data in a temporary storage location to reduce access time and improve system performance. It helps in minimizing the time required to retrieve data by keeping a copy of it in a fast cache closer to the application that uses it.

Resources:

Docs: High availability in Aiven for Caching

What is Dragonfly?

Aiven for Valkey

Clickhouse

ClickHouse is an open-source columnar database management system designed for fast analytical processing of large volumes of data. Its key use case is for real-time online analytical processing (OLAP).

Resources:

What is Clickhouse®?

Docs: ClickHouse® as a columnar database

Aiven for Clickhouse®

Cloud infrastructure costs

Cloud infrastructure costs include the expenses associated with the provision and usage of cloud computing resources, such as virtual machines, storage, and network bandwidth.

Resources:

What to consider when optimizing cloud infrastructure costs

Ebook: 7 ways to optimize your cloud infrastructure costs

Data pipeline

A data pipeline is a series of processes and components that facilitate the automated and controlled movement of data from source to destination, often involving data extraction, transformation, and loading (ETL). Data pipelines are used to enable data integration, analysis, and storage.

What Is a Data Pipeline? (Definition & Examples)

Resources:

Case study: Priceline

The future of data pipelines

Webinar: How to build data analytics pipelines faster than your morning commute

Data streaming

Data streaming is a method of transmitting and processing data continuously in real time. It allows for the efficient and immediate transfer of information, making it valuable for various applications such as event-driven architectures and microservices, marketing personalization, real-time analytics, change data capture, real-time AI recommendations, monitoring, and much more.

Data streaming: Your gateway to real-time insights

Resources:

Navigating Kafka: Challenges, solutions, and the future of real-time data streaming

Set up your data streaming infrastructure in 30 minutes

Get Real: Real-time data to drive real-time business

Create your own data stream for Apache Kafka® with Python and Faker

Database Administrator (DBA)

DBAs, or Database Administrators, are professionals responsible for managing and maintaining databases, ensuring their performance, security, and reliability.

Resources:

Why DBAs embrace managed services

E-L

ETL

ETL, or Extract, Transform, Load, is a process used in data integration where data is extracted from source systems, transformed into a suitable format, and loaded into a target database, data warehouse, or data lake for analysis and reporting.

What Is ETL?

Resources:

Aiven for Apache Kafka

Reverse your ETL

Move from Batch to Streaming

Docs: Aiven for Apache Kafka

Event and processing times

Event and Processing Times refer to the timing aspects of data processing in a system, where "event time" is the time when an event occurs, and "processing time" is the time when the system processes that event.

Resources:

Docs: Event and processing times

Event streaming

Event streaming refers to the continuous flow of events or data in real-time, allowing systems to react to and process events as they occur. It is commonly used for applications like real-time data analysis, monitoring, and building event-driven architectures.

Resources:

Aiven for Event Streaming

Evaluating your event streaming needs, the software architect way

Solving problems with event streaming

Case study: Simplilearn

Webinar: 4 ways to think like a software architect while evaluating solutions

Event-driven architecture

Event-driven architecture is a design approach where the flow of information and functionality is based on events or messages, with components reacting to events rather than relying on centralized control. This architecture is often used in real-time and distributed systems to improve scalability and responsiveness.

Resources:

Introduction to event-driven architecture

Google BigQuery

Google BigQuery is a fully-managed, serverless data warehouse that enables super-fast SQL queries using the processing power of Google's infrastructure.

Resources:

Fast analytics with Aiven for Apache Kafka® and Google BigQuery

Docs: Integrate Aiven for Apache Flink® with Google BigQuery

Hosted data streaming

Hosted Data Streaming refers to a service where the infrastructure and resources for data streaming are provided and managed by a third-party hosting provider.

Resources:

Aiven for Apache Kafka®

In-memory database

An in-memory database is a type of database management system that stores data primarily in the main memory (RAM) rather than on disk storage. This allows for significantly faster data access and processing compared to traditional disk-based databases. Examples include Valkey (open source alternative to Redis®) and Dragonfly.

Resources:

Introducing Aiven for Valkey

What is Dragonfly?

Aiven for Valkey

Kafka messaging

Kafka messaging involves the use of Apache Kafka®, a distributed streaming platform, to facilitate the seamless and fault-tolerant exchange of real-time data between applications, enabling efficient data integration and communication.

Resources:

What is Apache Kafka®?

Karapace

Karapace is an open-source project that provides a RESTful interface for Apache Kafka®, facilitating easier management and monitoring.

Resources:

What is Karapace?

Docs: Use Karapace with Aiven for Apache Kafka®

Docs: Get started

Karapace

Karapace is an open source project that provides a schema registry and REST API for Apache Kafka®. It is built and maintained by Aiven and licensed under Apache 2.0.

Resources:

Karapace strengthens schema management

Aiven for Kafka®

Key-value database

A key-value database is a type of NoSQL database that uses a simple key-value pair mechanism to store data. Each key is unique and maps directly to a single value, enabling efficient retrieval and storage of data. Examples include PostgreSQL, Valkey (open source alternative to Redis(R)) and Dragonfly.

Resources:

Introducing Aiven for Valkey

Introduction to Redis®

Aiven for Valkey

Aiven for PostgreSQL®

Klaw

Klaw is a web-based, fully open source data governance toolkit for Apache Kafka® topic and schema governance. Klaw helps Kafka admins add and define roles for Kafka users, create Kafka Topics, manage schemas, authorize producers and consumers, manage connectors, and more.

Resources:

Introducing Klaw for Apache Kafka® governance

Docs: Connect Aiven for Apache Kafka® with Klaw

Klaw in 2022: simplifying Apache Kafka data governance

Aiven for Kafka®

Kubernetes

Kubernetes are tools used for container orchestration and infrastructure provisioning, respectively, in cloud-native applications.

Resources:

Kubernetes vs. Terraform

Docs: Aiven Operator for Kubernetes®

M-R

Microservice

Microservice is a software architectural style where a system is composed of small, independent services that communicate over well-defined APIs. It promotes flexibility and scalability.

Resources:

How are your microservices talking?

MySQL^®

MySQL® is an open-source relational database management system (RDBMS) widely used for storing and managing structured data. It employs the SQL (Structured Query Language) for database management and is known for its reliability and performance.

Resources:

What is MySQL?

PostgreSQL^®

PostgreSQL is an open source relational database system (RDBMS) that has a strong reputation for reliability, feature robustness, and performance. Frequently called Postgres, it is SQL compliant. provides atomicity, consistency, isolation, durability (ACID) properties, and is commonly used as a large datastore for analytics and web services with many concurrent users.

Resources:

Introduction to PostgreSQL®

PostgreSQL® concepts and terms

Use cases for PostgreSQL®

Aiven for PostgreSQL®

Python data stream

Python Data Stream refers to the continuous flow of data in Python programming, often used in scenarios like data processing and analysis.

Resources:

Create your own data stream for Apache Kafka® with Python and Faker

Real time analytics

Real-Time Analytics involves the analysis of data as it is generated to provide immediate insights and decision-making capabilities.

Resources:

Why you should think about moving analytics from batch to real-time

Build a real-time analytics pipeline in less time than your morning bus ride

Real time data

Real-time data refers to information that is available immediately as it is generated or becomes relevant, without any significant delay. Real-time data is essential for applications like stock trading, monitoring, and dynamic decision-making.

Resources:

Real-time stock data with Apache Flink® and Apache Kafka®

Rsyslog

rsyslog is open-source software used for centralizing and managing log messages in a Unix or Unix-like environment.

Resources:

Docs: Remote syslog integration

S-Z

Stream processing

Stream processing is the practice of processing and analyzing data as it is continuously generated, without the need to store and batch-process it first. It is suitable for real-time applications like fraud detection, recommendation systems, and IoT data processing.

Resources:

Aiven for Apache Flink®

Webinar: Aiven for Apache Flink® - A new developer experience for data stream processing

Streaming data analytics

Streaming data analytics involves the real-time analysis of continuously generated data streams, allowing organizations to extract meaningful insights and make informed decisions based on up-to-the-moment information.

Resources:

Streaming data analytics in the real world

Terraform

Terraform is an infrastructure as code tool that lets you build, change, and version cloud and on-prem resources safely and efficiently.

Resources:

Aiven Terraform cookbook

Kubernetes vs. Terraform

Time series data

Time series data consists of data points collected and recorded at regular time intervals, allowing for the analysis of trends and patterns over time. It is commonly used metrics for gathering and in applications like weather forecasting, financial analysis, and IoT sensor data analysis.

Resources:

Managed ClickHouse database as a service

Time series database

A time series database is a specialized database system designed for efficient storage and retrieval of time series data. It is optimized for querying and analyzing data points with timestamps, making it suitable for applications where historical data and trends are essential, such as in monitoring and analytics.

Resources:

Docs: Choosing a timeseries database

Valkey

Valkey is an open source (BSD) high-performance key-value datastore based on the OSS version of the popular Redis® database which recently changed its licensing model. Valkey supports a variety of workloads such as caching, message queues, and can act as a primary database.

Resources:

Introducing Aiven for Valkey

Aiven for Valkey