Explore definitions on various key terms and concepts


Apache Cassandra®

Apache Cassandra is an open-source distributed, wide column, NoSQL database management system designed for handling large amounts of data across multiple servers.

Apache Flink®

Apache Flink is an open-source stream processing framework for big data processing and analytics. It supports event time processing and provides high-throughput, low-latency data processing.

Apache Kafka®

Apache Kafka is an open-source event streaming platform used for building real-time data pipelines and streaming applications. It facilitates the processing of large-scale, real-time data feeds.


Bring Your Own Cloud (BYOC) is a deployment model where managed data services are deployed directly to the customer's own cloud account. This means that customers can enjoy the managed service experience, while all compute, storage and networking infrastructure services - and associated costs - remain under their direct control.

Cloud infrastructure costs

Cloud infrastructure costs include the expenses associated with the provision and usage of cloud computing resources, such as virtual machines, storage, and network bandwidth.

Data pipeline

A data pipeline is a series of processes and components that facilitate the automated and controlled movement of data from source to destination, often involving data extraction, transformation, and loading (ETL). Data pipelines are used to enable data integration, analysis, and storage.

Database Administrator (DBA)

DBAs, or Database Administrators, are professionals responsible for managing and maintaining databases, ensuring their performance, security, and reliability.


Event and processing times

Event and Processing Times refer to the timing aspects of data processing in a system, where "event time" is the time when an event occurs, and "processing time" is the time when the system processes that event.

Event streaming

Event streaming refers to the continuous flow of events or data in real-time, allowing systems to react to and process events as they occur. It is commonly used for applications like real-time data analysis, monitoring, and building event-driven architectures.

Event-driven architecture

Event-driven architecture is a design approach where the flow of information and functionality is based on events or messages, with components reacting to events rather than relying on centralized control. This architecture is often used in real-time and distributed systems to improve scalability and responsiveness.

Google BigQuery

Google BigQuery is a fully-managed, serverless data warehouse that enables super-fast SQL queries using the processing power of Google's infrastructure.

Hosted data streaming

Hosted Data Streaming refers to a service where the infrastructure and resources for data streaming are provided and managed by a third-party hosting provider.

Kafka messaging

Kafka messaging involves the use of Apache Kafka®, a distributed streaming platform, to facilitate the seamless and fault-tolerant exchange of real-time data between applications, enabling efficient data integration and communication.


Karapace is an open-source project that provides a RESTful interface for Apache Kafka®, facilitating easier management and monitoring.


Kubernetes are tools used for container orchestration and infrastructure provisioning, respectively, in cloud-native applications.



Microservice is a software architectural style where a system is composed of small, independent services that communicate over well-defined APIs. It promotes flexibility and scalability.


MySQL® is an open-source relational database management system (RDBMS) widely used for storing and managing structured data. It employs the SQL (Structured Query Language) for database management and is known for its reliability and performance.


Python data stream

Python Data Stream refers to the continuous flow of data in Python programming, often used in scenarios like data processing and analysis.

Real time analytics

Real-Time Analytics involves the analysis of data as it is generated to provide immediate insights and decision-making capabilities.

Real time data

Real-time data refers to information that is available immediately as it is generated or becomes relevant, without any significant delay. Real-time data is essential for applications like stock trading, monitoring, and dynamic decision-making.


rsyslog is open-source software used for centralizing and managing log messages in a Unix or Unix-like environment.


Stream processing

Stream processing is the practice of processing and analyzing data as it is continuously generated, without the need to store and batch-process it first. It is suitable for real-time applications like fraud detection, recommendation systems, and IoT data processing.

Streaming data analytics

Streaming data analytics involves the real-time analysis of continuously generated data streams, allowing organizations to extract meaningful insights and make informed decisions based on up-to-the-moment information.


Terraform is an infrastructure as code tool that lets you build, change, and version cloud and on-prem resources safely and efficiently.

Time series data

Time series data consists of data points collected and recorded at regular time intervals, allowing for the analysis of trends and patterns over time. It is commonly used metrics for gathering and in applications like weather forecasting, financial analysis, and IoT sensor data analysis.

Time series database

A time series database is a specialized database system designed for efficient storage and retrieval of time series data. It is optimized for querying and analyzing data points with timestamps, making it suitable for applications where historical data and trends are essential, such as in monitoring and analytics.