At first glance, these 2 terms seem to be interchangeable due to similarities between them but let me assure you that they are not. I think this confusion started with the release of Apache Kafka which is defined as a “distributed streaming platform” but due to its pub/sub capabilities, these terms became confusing. I have identified 3 main key differences between these 2 patterns:
Pub / Sub
In the Pub / Sub pattern the most important unit is the message itself. We can indeed classify those messages in different categories called queues ( single message received by exactly one consumer ) or topics ( single message received by all subscribed consumers ) but the focus will always be on the message because our goal is to transfer some piece of information from a publisher to a subscriber.
When we are talking about Streaming the focus is on the data stream and not on the message. Of course, we care about our delivery semantics ( at-most-once, exactly-once, at-least-once ) but when we are adopting stream processing we care about the flow of data going through our systems.
There are 2 main considerations backing up the above statements:
- we can always “go back in time“; if something goes wrong while processing our message we can always consume it again by resetting our consumer offset; in pub/sub that’s not an option;
- stream processing allows processing every message but we can also perform some more complex operations, at the data stream level, like aggregations ( process multiple messages from the same stream ) and joins ( combine messages between different streams );
Pub / Sub
This will always be the case for the publish/subscribe pattern. The publisher and subscriber will never know the one about the other because there will always be a broker sitting in between them facilitating the exchange of messages.
Most often you will see stream processing done asynchronously but this is not a hard requirement. We can always do synchronous stream processing by adopting a Request-Response behavior.
Synchronous Stream Processing
When I designed this diagram, I envisioned Apache Kafka as the Data Hub and all those placeholders for messages, a separate topic on the Kafka Broker. As you will notice from the diagram we have 2 synchronous steps:
- creating the order ( has to wait for the second step to give back a response ):
- Request – create the order
- Response – order created
- altering the stock:
- Request – update the stocks
- Response – stocks updated
Pub / Sub
The ecosystem around pub/sub consists pretty much of a single component, the message broker. There is no need for some other tools to be part of it since this system is designed to do one thing and one thing only.
As I mentioned previously, while streaming, we need to do some more complex operations and so does the requirement for a richer ecosystem pops up. We can have additional tools that deal with data governance, integration between different common systems and even tools that help us to create streaming applications through different methods ( using Object-Oriented Programming Languages, Functional Programming Languages or even SQL like Programming Languages ).
Cheat Sheet of the Key Differences between Pub/Sub and Streaming
I hope you’ve got a pretty good idea about the key differences between these 2 patterns! Let me know what you think about this subject in the comments below or one of your favorite social media network! Also, feel free to share this post with anyone you would like!