In order to explain multi-tenancy, I’m going to use a very common scenario that you’re familiar with, renting apartments. Usually, people rent apartments with their friends (especially when they are younger ). Why? Well, probably they don’t really need an entire apartment for themselves and it’s also cheaper that way. By renting together, each person has it’s own private space without over-spending on dedicated apartments. The same principle applies to software multi-tenancy. The “apartment” is an actual application whereas the persons renting it can be either a company or a group of users of that software that needs to keep their data private.
Single vs. Multi-Tenant App
Apache Kafka Multi-Tenancy
There are 3 levels on which the multi-tenant concept can be implemented in Apache Kafka:
It’s not a secret, we all know that clients can produce/consume records ( also called messages or events ) to / from Apache Kafka. Implementing multi-tenancy at this level would be very similar to how multi-tenancy can be implemented on a relational database. Usually, this is done through a “tenants” table which stores information about all tenants in separate rows and it has some kind of primary key. For simplicity, we’re going to consider the primary key an id. Now, every other entity that needs to store multi-tenant data will have the tenant id as a foreign key reference.
If we transpose this model to Kafka then we would have a compaction topic called “tenants” on which we publish records of our tenants and all the other events that need to be multi-tenant will be stored in separate topics. The “tenants” topic has to be a compaction topic because we do not want to have tenants deleted by Kafka, but only by us when we want them to be, using tombstone messages.
What about multi-tenant topics? Where do we store the tenant id for each record? Actually, there are 3 options that we can use to actually solve this problem:
- key: storing the tenant id as the key of the message might be a good idea but we also have to take into consideration that Kafka uses message keys to partition messages in some cases; also, this approach will be very limiting while using compaction topics;
- value: embedding the tenant id in the message value can be a good solution but as a drawback, it can complicate the message schema;
- header: if we view the message appurtenance to a tenant as metadata we can then store the tenant id as a message header; by doing it this way there will be a minimum impact towards the message schemas and partitioning strategies
- easy to implement;
- minimum hassle on Kafka side;
- proxy required: the messages in these topics cannot be accessed by individual tenants; a proxy service is required to deal with authorization;
- security risk: if a non-multi-tenant application gets access to one of these topics, then it will have access to all tenant’s information.
In theory, this sounds super easy: each tenant will use a partition ( or more ) to store its messages on a topic. For example: if we have 3 tenants then we can have a multi-tenant topic with 3 partitions and each tenant will produce their messages in their own partition. The problem with this solution stays in the complexity of the Partitioner. We would need a really custom partitioner to make this happen. Also, this solution is not very scalable! Try to think about adding or removing tenants?! Changing the partition size every time a new tenant gets added is not really the prettiest solution.
- easy to consume messages of a specific tenant;
- easy to delete messages of a specific tenant;
- a custom Partitioner is required;
- same security risk as in the multi-tenant records;
- not scalable: it’s difficult to add/remove tenants;
This approach requires some bookkeeping on the Kafka side but it is the most secure and it imposes the least limitations compared to the other models. The idea is to either prefix or suffix the topic name with the tenant’s name or id, e.g: “tenantA.orders” or “orders.tenantA”. Using this approach we can then use separate ACL’s ( Access-Control Lists ) for each tenant-topic combo, so there will be no security risk of leaking information to other tenants. Adding and removing tenants is also quite easy since the only thing that we would have to do is to create or delete a topic. Also, this approach enables the use of quotas per tenant, so we can actually forbid the situation when one tenant is taking over the entire cluster. The biggest downside to this approach is the hassle required to keep track of the topics and ACL’s.
- the most secure approach;
- can make use of quotas per tenant;
- easily scalable;
- no custom code required;
- a lot of bookkeeping is required on the Kafka side;
- in the case of multi-tenant applications then we need to dynamically generate producers and consumers;