Defending Apache Kafka

Secure Your Apache Kafka Infrastructure: Best Practices for Planning, Monitoring, and Testing to Protect Against Security Threats and Data Integrity Issues.

You’ve likely placed Apache Kafka® in the center of your company's data universe. Understandably, it solves the need for data services across the company, including how easily it can integrate with other tools, thanks to Apache Kafka® Connect. But what happens when there are data integrity issues when users have access to much more than they should, and most importantly, when you are under attack?

To defend against attack, explore best practices for planning your data flow and designing your Kafka infrastructure to prevent possible security issues. But fear not if you, like many, already have Kafka set up. We’ll also discuss ways to execute, monitor, and test your existing Kafka deployments so that you can improve security, too!

Planning & Designing

Kafka is often woven throughout your data infrastructure, touching applications, microservices, monitoring, and real-time analytics. With such a broad reach, how you plan to implement the technology and defend your data infrastructure from the start sets you up for success in the long run. Planning well before integrating Kafka into any system allows you to evaluate the security features of the Kafka version being used, understand how it will be integrated with other systems, and define a responsibility matrix to clarify who has access to what within Kafka.

However, if poorly designed, you might have other issues with Kafka, such as data integrity issues such as improper schema management, lack of proper data retention policies, operational issues, poor monitoring and logging practices, reduced availability and reliability, and security vulnerabilities.

From the beginning, make sure you use the latest Kafka version—at the time of writing, that’s version 3.7.0. If you are not on 3.7, consider upgrading and checking the Apache Kafka Security Vulnerabilities page from the Apache Software Organization to make sure you are not affected by any of those vulnerabilities.

Put effort into defining what good data, sound data patterns, and good data pipelines look like. Vital questions to ask yourself and the stakeholders setting this up include:

  • How will your pipeline ingest the data? Consider whether you need real-time streams or batch processing. The way your pipeline will ingest the data will affect the security and tools needed.
  • How will your pipeline process and deliver the data? Determine whether your pipeline requires event processing, transformations, or data enrichment. Consider how you will ensure the data is delivered accurately and on time to the necessary endpoints.
  • What does the data look like? The shape of your data impacts the implementation of Kafka and the tools you use. Analyze the data schema, format, and any potential changes over time.

To learn more about determining the shape of your data, read Navigating the Data Maze: 5 Essential Questions to Guide Your Tool Selection.

Estimating the attacker's intent is an often overlooked mindset when planning.
Are you trying to map your data infrastructure?
Find openings in your infrastructure.
Find ways to connect to the user in one or two ways.
Find patterns in your infrastructure that allow you to reboot the system and lose functionality.

Good data, data patterns, and data pipelines require cross-departmental effort. You should always involve and include all possible stakeholders in your plan definition.

When defining a responsibility matrix to clarify who has access to what within Kafka, consider role-based access control (RBAC) and access control lists (ACLs). Knowing who has what access allows you to understand where attacks are coming from.

Kafka accepts many kinds of data into its topics. Using Kafka schemas to define expectations about the shape of the data that should be passed allows you to monitor the data moving through it more effectively. For example, if you only want records with a name, an ID, a value, and a string integer. Creating expectations lets you monitor and investigate when anomalies occur.

Planning includes executing code as infrastructure, encrypting data, trying to isolate workloads, and providing strict access control. Instead of relying on defaults, customize the settings for your organization's needs. For example, add more users to distinguish the different functionality you're implementing. Then, both the infrastructure and the data that is being passed through must be monitored.

You must also consider the different systems Kafka integrates with and how to secure those connections. Some items to consider would be Connector Plugins, Source Connectors, Sink Connectors, Configuration, Distributed Architecture, and/or a REST API.

Finally, test everything. Test the plans, the execution, and the monitoring. You should test continuously, ensuring everything is ready when something happens. Testing frameworks you could use are Testing Kafka Streams, Testcontainers for Kafka, and Embedded Kafka.

Planning your Kafka implementation beforehand is not always feasible. By taking the steps mentioned above, you can still analyze what changes you can make to prevent attacks.

Monitoring

The infrastructure and data passing through Kafka must be monitored to detect anomalies and ensure only expected, secure data patterns are allowed. This involves defining and monitoring standard data patterns to detect deviations that might indicate an attack, unauthorized access, or general misuse.
When you are monitoring infrastructure, consider monitoring Kafka brokers and Kafka nodes. There are other questions to ask, such as:

  • Network data considerations:
    • What data is passing through the network?
    • What are the IP addresses of the nodes in the Kafka cluster?
    • Are new Kafka brokers or nodes being added to the cluster and starting to communicate with existing nodes?
  • Access Control Lists (ACLs):
    • What are the access control lists (ACLs) in place for your Kafka topics and resources?
    • Are you rotating your credentials frequently enough to ensure security?

When monitoring your data, you shouldn’t only monitor the shape of the data but also your messages themselves. For example, you may want to monitor the size of the messages. All the messages passing through a Kafka topic are usually around the same size. Additionally, the volume and frequency of messages are approximately the same from day one to day two. When you start receiving messages that are three times the size, for example, you should be alerted and look into the cause of the change.

Questions to ask about the message changes:

  • What are the schemas of the new messages?
  • Do they follow the rules you set before, or are we receiving messages with different schemas?
  • Are you receiving messages with a dramatically different character count? Were you expecting a change in our message size, frequency, or volume, such as switching the originating system sending those messages?
  • What is the throughput?

You can set expectations for how your messages are usually distributed. By adjusting and validating these expectations, you can determine if you’re observing normal behavior or something unusual. Once these expectations are defined, you need to test them, including verifying that your infrastructure as code works, ensuring you and your team can recreate the same infrastructure elsewhere, and measuring how long it takes.

A few ways to monitor your Kafka are:

  • Classic Kafka: used for Application Performance Management (APM) to trace requests to and from Kafka clients automatically.
  • Karapace: used for schema management using schema registries, users, manual activities, lead time, security, and releases.
    • Schema registries allow you to measure global, request, and response metrics in one centralized location to define the structure of your data.
  • Grafana: used for visualizations of key metrics that are being monitored.
  • Klaw: used for data consistency, reliability, and secure access, aligning with organizational policies
  • Aiven Platform List Monitoring: To monitor metrics, logs, and alerts in Aiven

Securing

You can implement basic security practices such as using infrastructure as code for reproducibility and auditability, encrypting data both in transit and at rest, isolating workloads to minimize the attack surface, applying strict access control, and using schemas to enforce data integrity to defend your Kafka instance.

An attacker could check the present data if you don't secure Kafka’s buffers and queues. Just by doing that, an attacker can gain information. An attacker can take this information without having direct access to any endpoint in your ecosystem. With only sniffing the network, an attacker can create service disruption, steal data, and possibly steal credentials enough to do damage.

Encrypt data both in transit and at rest to protect sensitive information from being intercepted or accessed by unauthorized parties. Encrypt everything that you can. To isolate Kafka’s internal network from the external network, ensure that your producers and consumers only communicate with their associated Kafka brokers. This prevents data leakage and limits unnecessary exposure. Additionally, try isolating the network segments for each component, as Kafka’s notification system doesn’t need to interact directly with the database. By isolating these communication pathways, you can apply stricter controls to the data flow and enhance access control.

Next, always always divide read patterns from write patterns. If a service only needs to read from a topic, you should define a user and an ACL that only allows them to read from that topic. Follow the principle of least privilege and only grant access where it’s needed.

Finally, always use schemas. You can pass almost any data to Kafka, but schemas let you define expectations about the shape of the data. For example, you can define a schema that defines a specific kind of record and flag any anomalies. As with monitoring, Karapace is a great way to use schema registries, which allow you to measure global request and response metrics in one centralized location to define the structure of your data.

Testing

Now that we’ve discussed planning, monitoring, and security hygiene, test everything—from the plans to the execution and monitoring. You should continuously test everything to be ready when something happens.

Testing supports the plans we mentioned in the first section, and planning and designing assist in finding ways to prevent questions like whether you can afford to shut down a certain piece of functionality for a couple of hours. The ability to stop a service is not only driven by who is attacking what functionality. It is also driven by how much money the company might lose if you stop this functionality temporarily.

You should test the entire data pipeline. Your contingency plans, the responsibility matrix, and the monitoring discussed previously all need to be tested. You will probably never be ready for all the possible edge cases but you can be ready as possible.

Changing the distribution of your data, such as using read replicas or distributing data across different clouds and continents, helps set and validate expectations for normal behavior. Once you define all these concepts, it's time to test them. Testing goes from checking that your infrastructure as code works to checking that you can recreate the same infrastructure on another cloud provider and how much time that takes.

We also need to review what goes into the details of checking that we can see or find the anomalies we define or get alerts if some new anomaly pops up. However, testing also goes from the person clicking a button to the company's CEO. You should also test a case where you go to the CEO and say, ok, we're under attack.

A few tools for testing Kafka are:

  • Kafka Console Producer and Consumer: These built-in tools allow you to produce and consume messages from Kafka topics, making it easy to test the message flow.
  • Apache Kafka® MirrorMaker 2: This tool tests replication and disaster recovery setups.
  • Schema Registry: This tool ensures your data schemas are correctly managed and validated.
  • Grafana: This tool monitors Kafka metrics and sets up alerts.

Closing

If this seems overwhelming, some vendors can help you out, including Aiven. Acknowledge the security challenges and vulnerabilities associated with Apache Kafka, such as DDoS attacks, unauthorized access, and data integrity issues. You can mitigate these problems by planning, executing security hygiene practices, monitoring infrastructure and data services, and continuous testing. Using tools and vendors like Aiven to manage Kafka security and infrastructure is important.

With Aiven, we offer ways to take much effort off your shoulders. Aiven’s excellent security team secures Kafka and ten other services in the open source space for several companies, such as Back Market, La Redoute, and Doccla, which are available to everybody on top of any cloud.