Glossary
Explore definitions on various key terms and concepts
Explore definitions on various key terms and concepts
Apache Cassandra is an open-source distributed, wide column, NoSQL database management system designed for handling large amounts of data across multiple servers.
Apache Flink is an open-source stream processing framework for big data processing and analytics. It supports event time processing and provides high-throughput, low-latency data processing.
Apache Kafka is an open-source event streaming platform used for building real-time data pipelines and streaming applications. It facilitates the processing of large-scale, real-time data feeds.
Bring Your Own Cloud (BYOC) is a deployment model where managed data services are deployed directly to the customer's own cloud account. This means that customers can enjoy the managed service experience, while all compute, storage and networking infrastructure services - and associated costs - remain under their direct control.
Caching is a technique used to store frequently accessed data in a temporary storage location to reduce access time and improve system performance. It helps in minimizing the time required to retrieve data by keeping a copy of it in a fast cache closer to the application that uses it.
ClickHouse is an open-source columnar database management system designed for fast analytical processing of large volumes of data. Its key use case is for real-time online analytical processing (OLAP).
Cloud infrastructure costs include the expenses associated with the provision and usage of cloud computing resources, such as virtual machines, storage, and network bandwidth.
A data pipeline is a series of processes and components that facilitate the automated and controlled movement of data from source to destination, often involving data extraction, transformation, and loading (ETL). Data pipelines are used to enable data integration, analysis, and storage.
What Is a Data Pipeline? (Definition & Examples)Data streaming is a method of transmitting and processing data continuously in real time. It allows for the efficient and immediate transfer of information, making it valuable for various applications such as event-driven architectures and microservices, marketing personalization, real-time analytics, change data capture, real-time AI recommendations, monitoring, and much more.
Data streaming: Your gateway to real-time insightsDBAs, or Database Administrators, are professionals responsible for managing and maintaining databases, ensuring their performance, security, and reliability.
Resources:
ETL, or Extract, Transform, Load, is a process used in data integration where data is extracted from source systems, transformed into a suitable format, and loaded into a target database, data warehouse, or data lake for analysis and reporting.
What Is ETL?Event and Processing Times refer to the timing aspects of data processing in a system, where "event time" is the time when an event occurs, and "processing time" is the time when the system processes that event.
Resources:
Event streaming refers to the continuous flow of events or data in real-time, allowing systems to react to and process events as they occur. It is commonly used for applications like real-time data analysis, monitoring, and building event-driven architectures.
Event-driven architecture is a design approach where the flow of information and functionality is based on events or messages, with components reacting to events rather than relying on centralized control. This architecture is often used in real-time and distributed systems to improve scalability and responsiveness.
Google BigQuery is a fully-managed, serverless data warehouse that enables super-fast SQL queries using the processing power of Google's infrastructure.
Hosted Data Streaming refers to a service where the infrastructure and resources for data streaming are provided and managed by a third-party hosting provider.
Resources:
An in-memory database is a type of database management system that stores data primarily in the main memory (RAM) rather than on disk storage. This allows for significantly faster data access and processing compared to traditional disk-based databases. Examples include Valkey (open source alternative to Redis®) and Dragonfly.
Kafka messaging involves the use of Apache Kafka®, a distributed streaming platform, to facilitate the seamless and fault-tolerant exchange of real-time data between applications, enabling efficient data integration and communication.
Resources:
Karapace is an open-source project that provides a RESTful interface for Apache Kafka®, facilitating easier management and monitoring.
Karapace is an open source project that provides a schema registry and REST API for Apache Kafka®. It is built and maintained by Aiven and licensed under Apache 2.0.
A key-value database is a type of NoSQL database that uses a simple key-value pair mechanism to store data. Each key is unique and maps directly to a single value, enabling efficient retrieval and storage of data. Examples include PostgreSQL, Valkey (open source alternative to Redis(R)) and Dragonfly.
Klaw is a web-based, fully open source data governance toolkit for Apache Kafka® topic and schema governance. Klaw helps Kafka admins add and define roles for Kafka users, create Kafka Topics, manage schemas, authorize producers and consumers, manage connectors, and more.
Kubernetes are tools used for container orchestration and infrastructure provisioning, respectively, in cloud-native applications.
Microservice is a software architectural style where a system is composed of small, independent services that communicate over well-defined APIs. It promotes flexibility and scalability.
Resources:
MySQL® is an open-source relational database management system (RDBMS) widely used for storing and managing structured data. It employs the SQL (Structured Query Language) for database management and is known for its reliability and performance.
Resources:
PostgreSQL is an open source relational database system (RDBMS) that has a strong reputation for reliability, feature robustness, and performance. Frequently called Postgres, it is SQL compliant. provides atomicity, consistency, isolation, durability (ACID) properties, and is commonly used as a large datastore for analytics and web services with many concurrent users.
Python Data Stream refers to the continuous flow of data in Python programming, often used in scenarios like data processing and analysis.
Real-Time Analytics involves the analysis of data as it is generated to provide immediate insights and decision-making capabilities.
Real-time data refers to information that is available immediately as it is generated or becomes relevant, without any significant delay. Real-time data is essential for applications like stock trading, monitoring, and dynamic decision-making.
rsyslog is open-source software used for centralizing and managing log messages in a Unix or Unix-like environment.
Resources:
Stream processing is the practice of processing and analyzing data as it is continuously generated, without the need to store and batch-process it first. It is suitable for real-time applications like fraud detection, recommendation systems, and IoT data processing.
Streaming data analytics involves the real-time analysis of continuously generated data streams, allowing organizations to extract meaningful insights and make informed decisions based on up-to-the-moment information.
Terraform is an infrastructure as code tool that lets you build, change, and version cloud and on-prem resources safely and efficiently.
Time series data consists of data points collected and recorded at regular time intervals, allowing for the analysis of trends and patterns over time. It is commonly used metrics for gathering and in applications like weather forecasting, financial analysis, and IoT sensor data analysis.
Resources:
A time series database is a specialized database system designed for efficient storage and retrieval of time series data. It is optimized for querying and analyzing data points with timestamps, making it suitable for applications where historical data and trends are essential, such as in monitoring and analytics.
Valkey is an open source (BSD) high-performance key-value datastore based on the OSS version of the popular Redis® database which recently changed its licensing model. Valkey supports a variety of workloads such as caching, message queues, and can act as a primary database.