Q: What does BYOC stand for?

BYOC (Bring Your Own Cloud) is Aiven’s deployment model that runs Diskless Topics - and optionally other Aiven services - directly inside your own cloud environment. This model gives you full control over your environment while Aiven manages operations through its control plane.

Q: How does BYOC work?

Custom cloud setup: You register your cloud account with Aiven. A lightweight agent is deployed along with the required IAM roles and network rules in your specified subnets. Learn more Service deployment: When you launch a Diskless Topics service, all components, including agents, caches, and object store connectors, run inside your cloud account. Kafka data is written directly to your own S3 (or equivalent) buckets. Management plane connection: A secure connection links your cloud environment to the Aiven Management Plane for monitoring, scaling, and upgrades. No Kafka topic data leaves your VPC. Learn more . CLI and automation: You can automate deployments using the avn byoc CLI or with Terraform modules generated by the Aiven Console. Learn more.

Q: Why choose BYOC?

Cost control: Compute, storage, and network usage stay within your own cloud account, allowing you to take advantage of reserved instance pricing and long-term discounts. Compliance and residency: Data remains entirely within your environment. Logs, snapshots, and backups do not leave your cloud account, helping you meet data residency and audit requirements. Simplify procurement: There are no additional vendor VPCs or private-link charges. All billing appears in your existing cloud invoice. Operational freedom: Aiven automates scaling, patching, and upgrades, freeing your team from the overhead of self-hosting and manual operations.

Question 1

What is Inkless?

Accepted Answer

Inkless is the name for Aiven's innovative Apache Kafka service, purpose-built for the cloud. Inkless modernises Kafka's design by incorporating diskless topics to slash running costs.

Why are we naming it Inkless? Apache Kafka draws its name from Franz Kafka, the novelist whose ink-on-paper craft mirrors traditional data systems’ reliance on I/O operations. "Inkless" Kafka reimagines this paradigm, replacing I/O with cloud-native storage - enabling data persistence through scalable, decentralized architectures rather than conventional disk-bound writes.

Question 2

What assumptions do you make in your cost estimator?

Accepted Answer

The Kafka cost estimator reflects a real-world deployment across three availability zones (AZs). It includes key features like Tiered Storage and Fetch-from-Follower, SSD-backed brokers with built-in capacity headroom, and realistic cloud pricing for compute, storage, and cross-AZ traffic.

When diskless topics are selected, the model also accounts for the lower operational effort required to run these clusters.

1. High availability and replication

All Kafka clusters modeled in the calculator are spread across three AZs. Each topic uses a replication factor of 3, ensuring durability and availability.

This replication level is fixed in the estimator and matches Apache Kafka’s default recommendation for production environments.

2. Kafka features enabled by default

The estimator assumes two key features are always on:

Tiered Storage (KIP-405)Recent data (up to 12 hours) is stored on local SSDs. Older data is offloaded to object storage such as Amazon S3 or Google Cloud Storage.
Fetch-from-Follower (KIP-392)Consumers read from in-zone brokers, keeping 100% of traffic local to each AZ and reducing cross-AZ costs.

3. Disk and capacity guardrails

To reflect realistic operational behavior, the estimator includes resource usage limits for each broker:

If a workload can tolerate HDD latency, the calculator may favor Diskless Topics. Offloading data to remote storage removes the local disk I/O bottleneck.

4. Cluster sizing and partition limits

To avoid excessive partition reassignments and reduce mean time to recovery (MTTR), the calculator applies Kafka community sizing guidance:

Up to 60 brokers per cluster
Up to 4,000 partitions per broker
Up to 200,000 partitions per cluster
Up to 10,000 client connections per broker

When these thresholds are exceeded, the model assumes a second cluster is created.

5. Cross-AZ network pricing

The estimator uses actual cloud provider pricing for cross-AZ data transfer:

Cross-region traffic is not included by default. To model inter-region mirroring, enable the corresponding option in the calculator.

6. Operational effort

The estimator includes operational staffing assumptions based on Aiven’s internal telemetry:

Clusters using only Diskless Topics require less manual intervention. By offloading data to object storage, Tiered Storage removes the need for local disk management. This simplification also reduces the impact of incidents. The staffing estimates used in the calculator are intentionally conservative compared to public total cost of ownership (TCO) models.

Question 3

What does BYOC stand for?

Accepted Answer

BYOC (Bring Your Own Cloud) is Aiven’s deployment model that runs Diskless Topics - and optionally other Aiven services - directly inside your own cloud environment. This model gives you full control over your environment while Aiven manages operations through its control plane.

Question 4

How does BYOC work?

Accepted Answer

Custom cloud setup: You register your cloud account with Aiven. A lightweight agent is deployed along with the required IAM roles and network rules in your specified subnets. Learn more
Service deployment: When you launch a Diskless Topics service, all components, including agents, caches, and object store connectors, run inside your cloud account. Kafka data is written directly to your own S3 (or equivalent) buckets.
Management plane connection: A secure connection links your cloud environment to the Aiven Management Plane for monitoring, scaling, and upgrades. No Kafka topic data leaves your VPC. Learn more.
CLI and automation: You can automate deployments using the avn byoc CLI or with Terraform modules generated by the Aiven Console. Learn more.

Question 5

Why choose BYOC?

Accepted Answer

Cost control: Compute, storage, and network usage stay within your own cloud account, allowing you to take advantage of reserved instance pricing and long-term discounts.
Compliance and residency: Data remains entirely within your environment. Logs, snapshots, and backups do not leave your cloud account, helping you meet data residency and audit requirements.
Simplify procurement: There are no additional vendor VPCs or private-link charges. All billing appears in your existing cloud invoice.
Operational freedom: Aiven automates scaling, patching, and upgrades, freeing your team from the overhead of self-hosting and manual operations.

Question 6

What are BYOC tiers?

Accepted Answer

BYOC tiers are flat-fee subscription levels based on the sustained compressed ingress throughput of your Diskless Topics BYOC deployment.

Each tier includes:

Aiven’s self-healing control plane
99.99% uptime SLA
24×7 site reliability engineering (SRE) coverage
Security patches and automated version upgrades

Pricing scales with the volume of data (in MB/s) streamed through Kafka. Higher throughput maps to higher tiers, but every tier provides the same operational benefits.

Running Kafka at scale requires engineering expertise and operational maturity. With Diskless Topics in BYOC, data is stored in your own object storage, and compute runs in your own cloud account—enabling you to benefit from your provider’s cost optimizations. Aiven handles deployment, monitoring, upgrades, and recovery.

The pricing model is designed to keep the cost per MB/s low while meeting Aiven’s reliability and automation standards. It simplifies operations and offers a cost-effective alternative to managing Kafka infrastructure in-house.

Sample BYOC tiers

The following tiers represent 95th percentile sustained throughput ranges. They are not hard limits. Occasional short bursts above the defined range are allowed. If sustained traffic increases, you can upgrade tiers without downtime.

Flexibility and cloud consistency

These tiers are designed to be flexible starting points, not rigid constraints. Aiven’s team can:

Recommend the best instance type based on your workload (for example, compute-heavy compression, memory-heavy fan-outs)
Create a custom tier if your traffic pattern includes irregular spikes, long-tail ETL loads, or regulatory requirements.

The pricing remains consistent across AWS, Google Cloud, and Azure. You retain any savings from reserved instances, committed use discounts, or storage tiering available in your own account.

Question 7

How do I get started?

Accepted Answer

Diskless topics are now an integrated feature within the Aiven for Kafka service, specifically for Bring Your Own Cloud(BYOC) deployments. To use them, enable the feature as part of your service's creation. You can find this option in the Console UI or when using other creation methods like the CLI, API, or Terraform.

Question 8

How do Diskless Topics compare to traditional Kafka?

Accepted Answer

Diskless topics are fully compatible with traditional Kafka topics. They use the same producer and consumer APIs, preserve message ordering and offsets, support exactly-once semantics, and can run alongside classic topics in the same cluster—no application changes needed.

The difference is in how data is stored. Instead of writing to broker disks, diskless topics stream data directly to cloud object storage.

This approach offers three key benefits:

Lower storage costs by using scalable object storage instead of premium disks
Independent scaling of compute and storage
Longer retention without adding broker capacity

Question 9

What are good use cases for Diskless Topics?

Accepted Answer

Diskless topics are not ideal for ultra-low-latency workloads that require sub–500 ms end-to-end delivery. For those scenarios, traditional disk-based Kafka is a better fit.

However, for use cases that can tolerate one to two seconds of latency between producing and consuming data, diskless topics offer significant advantages. They reduce storage costs and allow compute and retention to scale independently.

Where diskless topics shine

Metric	Max seen in production	Production limits (tested)	Production limits (future)
Data In	1.8 GB/sec	10 GB/sec	Unlimited
P99 Diskless Latency	1500ms	2000ms	800ms
Partitions	68,000	154,000	Unlimited
Connections	120,000	Unlimited	Unlimited

Constraint	Reason
SSD-only storage	HDDs increase tail latency; SSDs match Aiven’s Kafka fleet
≤ 40% disk usage per broker	Leaves headroom for rebalancing and unexpected traffic spikes
≤ 80% CPU and network utilization	Reduces jitter and prevents resource throttling
Maximum disk size: 16 TiB	Larger EBS volumes slow down restarts significantly on AWS

Cloud provider	Cross-AZ cost (per GiB)
AWS	$0.02 (bidirectional)
Google Cloud	$0.01
Azure	$0.00 (within region only)

Cluster type	Staffing estimate (FTE per 100 MiB/s sustained ingest)
Classic Kafka	0.5 FTE
Diskless Kafka	0.125 FTE

Tier	Sustained ingest (MB/s)	Example use cases
T1 – Pilot	≤ 20	CI/CD pipelines, proof-of-concept environments
T2 – Starter	21 – 50	Single-team apps, small SaaS products
T3 – Growth	51 – 100	Multi-team workloads, analytics pipelines
T4 – Scale	101 – 200	Regional IoT ingestion, clickstream data
T5 – Large	201 – 500	Heavy telemetry, personalization engines
T6 – XL	501 – 1 000	Ad tech, financial data feeds
T7 – XXL	1 001 – 1 500	Game backends, national-scale telemetry
T8 – Ultra	1 501 – 2 000	CDN logs, multi-region streaming workloads
Custom	2 000	High-throughput systems beyond standard tiers

Inkless

Open source Apache Kafka®, now diskless

Diskless architecture

See exactly what you can save with Inkless

Bring Your Own Cloud (BYOC) Estimator

Cost savings

$114,082 / month

Unify. Save. Simplify

Unify Real‑Time & Batch

Slash Kafka TCO by ≤ 80 %

100 % Kafka, 0 % Migration Pain

Performance characteristics

Frequently asked questions

Ready to revolutionize your streaming architecture?

Workload	Why the trade-off pays off	Supporting insight
Application & infrastructure logs	High-volume data with long retention requirements; real-time dashboards can tolerate some delay.	Object storage offers lower per-GB cost with high durability (for example, "eleven nines").
Telemetry and metrics	Time-series data (such as Prometheus, OpenTelemetry, or similar) streams continuously, but dashboards typically refresh every few seconds.	Remote tiers handle sustained write spikes without re-sharding disks.
IoT sensor data	Millions of small messages; cost is a bigger concern than sub-second speed.	Typical object storage write latency (100–200 ms) is acceptable for these workloads.
Clickstream and user analytics	Web/mobile events feed near-real-time dashboards and nightly batch jobs.	Tiered storage decouples compute from retention as data volume grows.
Change-data capture (CDC)	Slight lag is fine when capturing database changes into data lakes.	Object storage aligns with downstream formats like Parquet and Iceberg.
ML feature logging and training	Models train on massive histories; replaying events is asynchronous.	Diskless keeps costs low while preserving Kafka ordering and semantics.
Security trails and audit logs	Compliance requires long retention, but speed is less important.	Offloading to object storage avoids expanding broker disk usage.
Backup and replay queues	Used for batch workflows or disaster recovery; prioritizes durability over speed.	Data streams directly into durable object storage for long-term recovery.