kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From João Peixoto <joao.harti...@gmail.com>
Subject Re: Partitions as mechanism to keep multitenant segregated data
Date Tue, 23 May 2017 15:45:26 GMT
It seems like you're trying to use the partitioning mechanism as a routing
mechanism, which afaik is not really its objective.

It may work but it is definitely not the best approach imo.
1. You're throwing away the parallelism capabilities of Kafka. You'll have
a single "queue" per customer. By that point you could not use Kafka and
just have different REST endpoint for each customer routed through headers
for example.
2. Repartitioning is a cumbersome affair. If your customer pool increases
past your projections you'll need to shut everyone down while you change
the number of partitions.


On Tue, May 23, 2017 at 8:38 AM Tom Crayford <tcrayford@heroku.com> wrote:

> That might be ok. If that's the case, you can probably just "precreate" all
> the partitions for them upfront and avoid any worry about having to futz
> with consumers.
>
> On Tue, May 23, 2017 at 4:33 PM, David Espinosa <espixxl@gmail.com> wrote:
>
> > Thanks for the answer Tom,
> > Indeed I will not have more than 10 or 20 customer per cluster, so that's
> > also the maximum number of partitions possible per topic.
> > Still a bad idea?
> >
> > 2017-05-23 16:48 GMT+02:00 Tom Crayford <tcrayford@heroku.com>:
> >
> > > Hi there,
> > >
> > > I don't know about the consumer, but I'd *strongly* recommend not
> > designing
> > > your application around this. Kafka has severe and notable stability
> > > concerns with large numbers of partitions, and requiring "one partition
> > per
> > > customer" is going to be limiting, unless you only ever expect to have
> > > *very* small customer numbers (hundreds at most, ever). Instead, use a
> > hash
> > > function and a key, as recommended to land customers on the same
> > partition.
> > >
> > > Thanks
> > >
> > > Tom Crayford
> > > Heroku Kafka
> > >
> > > On Tue, May 23, 2017 at 9:46 AM, David Espinosa <espixxl@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > In order to keep separated (physically) the data from different
> > customers
> > > > in our application, we are using a custom partitioner to drive
> messages
> > > to
> > > > a concrete partition of a topic. We know that we are loosing
> > parallelism
> > > > per topic this way, but our requirements regarding multitenancy are
> > > higher
> > > > than our throughput requirements.
> > > >
> > > > So, in order to increase the number of customers working on a
> cluster,
> > we
> > > > are increasing the number of partitions dinamically per topic as the
> > new
> > > > customer arrives using kafka AdminUtilities.
> > > > Our problem arrives when using the new kafka consumer and a new
> > partition
> > > > is added into the topic, as this consumer doesn't get updated with
> the
> > > "new
> > > > partition" and therefore messages driven into that new partition
> never
> > > > arrives to this consumer unless we reload the consumer itself. What
> was
> > > > surprising was to check that using the old consumer (configured to
> deal
> > > > with Zookeeper), a consumer does get messages from a new added
> > partition.
> > > >
> > > > Is there a way to emulate the old consumer behaviour when new
> > partitions
> > > > are added in the new consumer?
> > > >
> > > > Thanks in advance,
> > > > David
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message