kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dana Powers <dana.pow...@gmail.com>
Subject Re: Same partition number of different Kafka topcs
Date Thu, 04 Aug 2016 03:00:19 GMT
kafka-python by default uses the same partitioning algorithm as the Java
client. If there are bugs, please let me know. I think the issue here is
with the default nodejs partitioner.

-Dana
On Aug 3, 2016 7:03 PM, "Jack Huang" <jackhuang@mz.com> wrote:

I see, thanks for the clarification.

On Tue, Aug 2, 2016 at 10:07 PM, Ewen Cheslack-Postava <ewen@confluent.io>
wrote:

> Jack,
>
> The partition is always selected by the client -- if it weren't the
brokers
> would need to forward requests since different partitions are handled by
> different brokers. The only "default Kafka partitioner" is the one that
you
> could consider "standardized" by the Java client implementation. Some
> client libraries will make this pluggable like the Java client does so you
> could use a compatible implementation.
>
> -Ewen
>
> On Fri, Jul 29, 2016 at 11:27 AM, Jack Huang <jackhuang@mz.com> wrote:
>
> > Hi Gerard,
> >
> > After further digging, I found that the clients we are using also have
> > different partitioner. The Python one uses murmur2 (
> >
> >
>
https://github.com/dpkp/kafka-python/blob/master/kafka/partitioner/default.py
> > ),
> > and the NodeJS one uses its own impl (
> > https://github.com/SOHU-Co/kafka-node/blob/master/lib/partitioner.js).
> > Does
> > Kafka delegate the task of partitioning to client? From their
> documentation
> > it doesn't seem like they provide an option to select the "default Kafka
> > partitioner".
> >
> > Thanks,
> > Jack
> >
> >
> > On Fri, Jul 29, 2016 at 7:42 AM, Gerard Klijs <gerard.klijs@dizzit.com>
> > wrote:
> >
> > > The default partitioner will take the key, make the hash from it, and
> do
> > a
> > > modulo operation to determine the partition it goes to. Some things
> which
> > > might cause it to and up different for different topics:
> > > - partition number are not the same (you already checked)
> > > - key is not exactly the same, for example one might have a space
after
> > the
> > > id
> > > - the other topic is configured to use another partitioner
> > > - the serialiser for the key is different for both topics, since the
> hash
> > > is created based on the bytes of key of the serialised message
> > > - all the topics use another partitioner (for example round robin)
> > >
> > > On Thu, Jul 28, 2016 at 9:11 PM Jack Huang <jackhuang@mz.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have an application where I need to join events from two different
> > > > topics. Every event is identified by an id, which is used as the key
> > for
> > > > the topic partition. After doing some experiment, I observed that
> > events
> > > > will go into different partitions even if the number of partitions
> for
> > > both
> > > > topics are the same. I can't find any documentation on this point
> > though.
> > > > Does anyone know if this is indeed the case?
> > > >
> > > >
> > > > Thanks,
> > > > Jack
> > > >
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message