Project Metamorphosis: Unveiling the next-gen event streaming platformLearn More

Secure Stream Processing with the Streams API in Kafka

This blog post is the second in a series about the Streams API of Apache Kafka, the new stream processing library of the Apache Kafka project, which was introduced in Kafka v0.10.

Current blog posts in the Streams API in Kafka series:

  1. Elastic Scaling in the Streams API in Kafka
  2. Secure Stream Processing with the Streams API in Kafka (this post)
  3. Data Reprocessing with the Streams API in Kafka: Resetting a Streams Application

In this post we describe the security features of the Streams API in Kafka.  Many use cases and applications — whether it is in the area of stream processing or elsewhere — have tight internal and/or external security requirements. Industries where such requirements are common include finance and healthcare in the private sector but also include governmental services.  Legal compliance, for example, may require you to implement certain security measures such as encrypting data-in-transit when you are working on sensitive data such as personally identifiable information. You may also be required to enforce authentication and authorization to limit access to data to only a subset of employees or personnel to adhere to information security policies such as Need-to-know.

The question we want to answer in this blog post is:

  • Question (secure stream processing): If you are working in sensitive and regulated environments, what are your security options with regards to stream processing, notably when you are using a. Apache Kafka as the foundation of your data infrastructure and b. the Streams API as the means to process the data in Apache Kafka?

Let’s start with a quick answer to this question:

  • Answer (secure stream processing): Apache Kafka ships with a range of security features including but not limited to client authentication, client authorization, and data encryption.  These security features help you to implement your security policies and to protect your valuable data against internal and external threats.  On the side of processing the data in Kafka, your best option is to use the Streams API in Kafka to build your stream processing applications because it integrates natively with Kafka’s security features.

We can now walk through this answer in further detail.

First, which security features are available in Apache Kafka, and thus in the Streams API?  The Streams API in Kafka supports all the client-side security features in Apache Kafka.  In this short blog post we cannot cover these client-side security features in full detail, so I recommend reading the Kafka Security chapter in the Confluent Platform documentation and our previous blog post Apache Kafka Security 101 to familiarize yourself with the security features that are currently available in Apache Kafka.

That said, let me highlight a couple of important Kafka security features that are essential for implementing robust data infrastructures, whether these are used for building horizontal services at larger companies, for multi-tenant infrastructures (e.g. microservices), or for shared platforms such as in the Internet of Things.  Later on I will then demonstrate an example application where we use some of these security features in the Streams API in Kafka.

Kafka security features include:

  1. Encrypting data-in-transit between the servers of a Kafka cluster:  You can enable the encryption of broker-to-broker communication.  Brokers communicate with each other, for example, to replicate data for fault-tolerance.
  2. Encrypting data-in-transit between Kafka servers and Kafka clients: You can enable the encryption of the client-server communication between the Kafka servers/brokers and Kafka clients.  Kafka clients include stream processing applications built using the Streams API in Kafka library.
    • Example:  You can configure your Streams API applications to always use encryption when reading data from Kafka and when writing data to Kafka; this is very important when reading/writing data across security domains (e.g. internal network vs. public Internet or partner network).
  3. Client authentication: You can enable client authentication for connections from Kafka clients (including the Streams API in Kafka) to Kafka brokers/servers.
    • Example:  You can define that only some specific Streams API applications are allowed to connect to your production Kafka cluster.
  4. Client authorization: You can enable client authorization of read/write operations by Kafka clients.
    • Example:  You can define that only some specific Streams API application are allowed to read from a Kafka topic that stores sensitive data.  Similarly, you can restrict write access to certain Kafka topics to only a few stream processing applications to prevent e.g. data pollution or fraudulent activities.

It’s worth noting that the aforementioned security features in Apache Kafka are optional, and it is up to you to decide whether to enable or disable any of them.  And you can mix and match these security features as needed: both secured and non-secured Kafka clusters are supported, as well as a mix of authenticated, unauthenticated, encrypted and non-encrypted clients.  This flexibility allows you to model the security functionality in Kafka to match your specific needs, and to make effective cost vs. benefit (read: security vs. convenience/agility) tradeoffs: tighter security requirements in places where security matters (e.g. production), and relaxed requirements in other situations (e.g. development, testing).

Second, how do you use these security features in the Streams API i.e. when building your own stream processing applications?  The most important aspect to understand is that the Streams API leverages the standard Kafka producer and consumer clients behind the scenes.  Hence what you need to do to secure your stream processing applications is to configure the appropriate security settings of the corresponding Kafka producer/consumer clients.  Once you know which client-side security features you want to use, you simply need to include the corresponding settings in the configuration of your Streams API application.

Let’s show a simple example, based our previous blog post Apache Kafka Security 101.  What we want to do is to configure our Streams API application to 1. encrypt data-in-transit when communicating with its target Kafka cluster and 2. enable client authentication.

Tip: A complete demo application including step-by-step instructions is available at SecureKafkaStreamsExample.java under https://github.com/confluentinc/examples.

For the sake of brevity, we assume that a. the security setup of the Kafka brokers in the cluster is already completed and b. the necessary SSL certificates are available to your Streams API in Kafka application in the filesystem locations specified below (the aforementioned blog post walks you through the steps to generate them); for example, if you are using Docker to containerize your Streams API in Kafka applications, then you must also include these SSL certificates in the right locations within the Docker image.

Once these two assumptions are met, you must only configure the corresponding settings for the Kafka clients in your Streams API application.  The configuration snippet below shows the settings to enable client authentication and enable SSL encryption for data-in-transit between your Streams API application and the Kafka cluster it is reading from and writing to:

Within a Streams API application, you’d use code such as the following to configure these settings in your StreamsConfig instance:

With these settings in place your Streams API in Kafka application will encrypt any data-in-transit that is being read from or written to Kafka, and it will also authenticate itself against the Kafka brokers that it is talking to.  (Note that this simple example does not cover client authorization.)

Now what would happen if you misconfigured the security settings in your Kafka Streams application?  In this case, the application would fail at runtime, right after you started it.  For example, if you entered an incorrect password for the ssl.keystore.password setting, then the following error messages would be logged, and after that the application would terminate:

Similar exceptions would be thrown if you misconfigured other security settings such as ssl.key.password:

Your Operations team can monitor the log files of your Streams API applications for such error messages to spot any misconfigured applications quickly, and to alert the corresponding teams.

In summary, the Streams API in Kafka makes your stream processing applications secure, and it achieves this through its native integration with Apache Kafka’s security functionality.

If you have enjoyed this article, you might want to continue with the following resources to learn more about Apache Kafka’s Streams API:

And if you are interested in further information on Kafka’s security features, I’d recommend to read Apache Kafka Security 101.

Did you like this blog post? Share it now

Subscribe to the Confluent blog

More Articles Like This

Best Practices to Secure Your Apache Kafka Deployment

For many organizations, Apache Kafka® is the backbone and source of truth for data systems across the enterprise. Protecting your event streaming platform is critical for data security and often […]

Kafka Streams Interactive Queries Go Prime Time

What is stopping you from using Kafka Streams as your data layer for building applications? After all, it comes with fast, embedded RocksDB storage, takes care of redundancy for you, […]

From Eager to Smarter in Apache Kafka Consumer Rebalances

Everyone wants their infrastructure to be highly available, and ksqlDB is no different. But crucial properties like high availability don’t come without a thoughtful, rigorous design. We thought hard about […]

Sign Up Now

Start your 3-month trial. Get up to $200 off on each of your first 3 Confluent Cloud monthly bills

新規登録のみ。

上の「新規登録」をクリックすることにより、当社がお客様の個人情報を以下に従い処理することを理解されたものとみなします : プライバシーポリシー

上記の「新規登録」をクリックすることにより、お客様は以下に同意するものとします。 サービス利用規約 Confluent からのマーケティングメールの随時受信にも同意するものとします。また、当社がお客様の個人情報を以下に従い処理することを理解されたものとみなします: プライバシーポリシー

単一の Kafka Broker の場合には永遠に無料
i

商用版の機能を単一の Kafka Broker で無期限で使用できるソフトウェアです。2番目の Broker を追加すると、30日間の商用版試用期間が自動で開始します。この制限を単一の Broker へ戻すことでリセットすることはできません。

デプロイのタイプを選択
手動デプロイ
  • tar
  • zip
  • deb
  • rpm
  • docker
または
自動デプロイ
  • kubernetes
  • ansible

上の「無料ダウンロード」をクリックすることにより、当社がお客様の個人情報をプライバシーポリシーに従い処理することを理解されたものとみなします。 プライバシーポリシー

以下の「ダウンロード」をクリックすることにより、お客様は以下に同意するものとします。 Confluent ライセンス契約 Confluent からのマーケティングメールの随時受信にも同意するものとします。また、お客様の個人データが以下に従い処理することにも同意するものとします: プライバシーポリシー

このウェブサイトでは、ユーザーエクスペリエンスの向上に加え、ウェブサイトのパフォーマンスとトラフィック分析のため、Cookie を使用しています。また、サイトの使用に関する情報をソーシャルメディア、広告、分析のパートナーと共有しています。