Quite an interesting title, right? At first, you would think that all these terms represent different things but in fact, they are not! Apache Kafka is a distributed streaming platform, meaning that it can do 3 things:
- Act as a messaging system ( pub / sub )
- Store data in a distributed fashion
- Process data in a streaming way
The way that Kafka stores and transmits data is by using elements called “records”. A record is nothing more than a piece of information that moves through and stored in the system. Now, I’ve said that Kafka can act as a messaging system so if we refer to the pub / sub terminology, the same element can be called a “message”. Due to the way that Kafka stores data, it also makes a great fit to serve as the core of an Event-Driven Architecture. In EDA the “records” moving through the system are called “events“. So technically there is no difference between these 3 terms, all of them represent the same thing. It’s just that Kafka is a great fit for many problems and that’s why multiple terms are used to indicate the same thing.
The Structure of a Kafka Record
The Structure of a Kafka Record
A Kafka record is composed of the following:
Sometimes we need to add metadata to our messages and using headers it’s the best way to do that. A header is composed of two things: key and value. A key will always be an encoded String. It doesn’t really make sense to use numbers or complex objects to distinguish metadata fields so that’s why the use of Strings was preferred. A value, on the hand, it can be literally anything. Under the hood, it will be stored as a byte array so it’s up to you to decide what to put in there. It can be another String, an Integer or even a JSON object. As long as you have the serialization and deserialization process correctly put in place there is no limitation.
You might be thinking “I’m sending messages over a message broker, why would I need keys for my messages?” This is an extremely valid question and in a lot of use cases, you don’t really need keys for your messages. Message keys are useful though in some scenarios like partitioned topics, compaction topics or when you need to perform some aggregations. There is a lot to talk about these topics but I’m going to save them for other blog posts. Regarding the format of a Kafka Record key, well … it can be anything. Under the hood, it is stored as a byte array so just make sure that both your producer and consumer are aware of the serialization format used.
It represents the core of the record. Messages ( or events ) are used to transmit data, from a simple hover event on the browser to a full-blown “order” object, they are meant to be incorporated somehow into the messages. The message value is the place where we should store these objects. Just as the message key, the value will be converted to a byte array so we can simply use any kind of serialization format. Some of the most popular serialization formats used in the Kafka world are JSON, AVRO, and nowadays, Protobuf is also picking up some speed.
Messages are produced over time ( not all at once ) so we need a way to determine when a specific message has been produced. By default, the Kafka Producer sets a timestamp when a new message is being produced. The underlying representation of the timestamp is the number of milliseconds since Unix epoch. This doesn’t mean that all the messages will be ordered on the Kafka Broker!!! Sometimes, it may be the case that a message with a later timestamp can arrive on the broker before a message with an earlier timestamp.