Apache Kafka started to get a lot of attention lately and to be honest, that’s great! Us, as developers, we get tons of benefits when a product becomes that popular because the ecosystem around it grows as well.
It’s pretty hard to find a place where you can find all the tools that have been built on top of Apache Kafka so I’ve decided to gather as much information as I could and present them in a series of blog posts. If there is something missing, let me know and I’ll gladly add any tool or service to the list.
Having said that, I must also mention that my interest is NOT to advocate for or against any tool, service, product and / or company but simply present a list with what is out there on the market ( Open Source and Enterprise ) and its capabilities.
The first post will be, of course, about the Kafka Core Ecosystem.
Apache Kafka Core Ecosystem
The core ecosystem is well… composed of the core components developed by open source maintainers which are part of the Apache Project.
Apache Kafka Core Ecosystem
It’s the centerpiece of the entire ecosystem, without it nothing would be possible. The Kafka Broker is the core component that handles the distribution and replication of messages across all its instances. It acts as an intermediate between the source of the messages, also called producer, and the sink, which is called a consumer. If multiple broker instances are coupled together as a “team” they form a cluster, hence the name Kafka Cluster.
Although Zookeeper is NOT part of the Apache Kafka core ecosystem, I’m going to put it in here since the Kafka Brokers actually require a Zookeeper for different purposes ( leader election, topic management, cluster membership, quotas and ACL’s ). In the future, this component will not be required anymore, check KIP-500.
There are 3 clients that can be used to interact with the Kafka Broker:
We need a way to send messages to the Kafka Broker and that is what Producer API can help us with. There are various implementation in a lot of programming languages ( at least the most popular ones ) so you shouldn’t worry if you’re planning to work with some not so popular programming language.
We’ve managed to produce messages to the Kafka Broker so now we need a way to retrieve them. Consumer API has been created for exactly this use case. It supports lots of configuration options and it allows even grouping consumers to work as a team using the concept “consumer group”.
Just like its name is saying, Admin Client can be used to perform administrative tasks against the Kafka Broker. Some of these operations may include creating, listing and deleting topics, manage Access Control Lists (ACL’s for short ), and even delete records present on the Kafka Broker.
Having producers and consumers it’s cool but it also implies a lot of boilerplate code. There are some uses cases when this can become problematic. One of the use cases is transferring data from or to an external system which most often is a very common type of application ( e.g: databases, key-value stores, search engines, etc. ). For exactly this use case Kafka Connect has been built. It allows transferring data in and out of the Kafka Cluster from and to common data systems. It has an architecture similar to the Kafka Brokers which allows grouping the so-called “workers” to form a cluster. Also, the amount of code that you need to write is reduced to almost 0 since Kafka Connect leverages some components called “connectors” which serves as an integration layer between the system that you want to connect to and Kafka Connect.
Another very frequently used pattern is consuming data from a Kafka Cluster, processing it a bit and then producing it again to Kafka but on a different topic. This type of application is called a “streaming” application. Kafka Streams are nothing more than a Domain Specific Language ( DSL ) built on top of the Consumer and Producer API to reduce the amount of code written to build such an application. Also, it comes with a lot of useful patterns ( aggregating, windowing, etc. ) already integrated so that you won’t have to worry about some specific edge cases.