kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans Jespersen <h...@confluent.io>
Subject Re: Regarding Kafka
Date Sun, 09 Oct 2016 16:38:09 GMT
I'm pretty sure Jun was talking about the Java API in the quoted blog text, not librdkafka.
There is only one thread in the new Java consumer so you wouldn't see this behavior. I do
not think that librdkafka makes any such guarantee to dispatch unique keys to each thread
but I'm not an expert in librdkafka so others may be about to help you better on that. 
//hans@confluent.io
-------- Original message --------From: Abhit Kalsotra <abhit011@gmail.com> Date: 10/9/16
 3:58 AM  (GMT-08:00) To: users@kafka.apache.org Subject: Re: Regarding Kafka 
I did that but i am getting confusing results

e.g

I have created 4 Kafka Consumer threads for doing data analytic, these
threads just wait for Kafka messages to get consumed and
I have provided the key provided when I produce, it means that all the
messages will go to one single partition ref "
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
"
"* On the consumer side, Kafka always gives a single partition’s data to
one consumer thread.*"

If you see my application logs, my 4 Kafka Consumer Application threads
which are calling consume() , Arn't all message of a particular ID should
be consumed by one Kafka Application thread ?

[2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 74 ][ID ID
date:2016-09-28 20:07:32.000 ]
[2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4496 offset: 80 ][ID ID
date: 2016-09-28 20:07:39.000 ]
[2016-10-08 23:37:07.498]AxThreadId 2208 ->ID:4495 offset: 77 ][ID
date: 2016-09-28 20:07:35.000 ]
[2016-10-08 23:37:07.498]AxThreadId 23516 ->ID:4495 offset: 76][ID
date: 2016-09-28 20:07:34.000 ]
[2016-10-08 23:37:07.498]AxThreadId 9540 ->ID:4495 offset: 75 ][ID
date: 2016-09-28 20:07:33.000 ]
[2016-10-08 23:37:07.499]AxThreadId 23516 ->ID:4495 offset: 78 ][ID
date: 2016-09-28 20:07:36.000 ]
[2016-10-08 23:37:07.499]AxThreadId 2208 ->ID:4495 offset: 79 ][ID
date: 2016-09-28 20:07:37.000 ]
[2016-10-08 23:37:07.499]AxThreadId 9540 ->ID:4495 offset: 80 ][ID
date: 2016-09-28 20:07:38.000 ]
[2016-10-08 23:37:07.500]AxThreadId 23516 ->ID:4495 offset: 81][ID
date: 2016-09-28 20:07:39.000 ]




On Sun, Oct 9, 2016 at 1:31 PM, Hans Jespersen <hans@confluent.io> wrote:

> Then publish with the user ID as the key and all messages for the same key
> will be guaranteed to go to the same partition and therefore be in order
> for whichever consumer gets that partition.
>
>
> //hans@confluent.io
> -------- Original message --------From: Abhit Kalsotra <abhit011@gmail.com>
> Date: 10/9/16  12:39 AM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> What about the order of message getting received ? If i don't mention the
> partition.
>
> Lets say if i have user ID :4456 and I have to do some analytics at the
> Kafka Consumer end and at my consumer end if its not getting consumed the
> way I sent, then my analytics will go haywire.
>
> Abhi
>
> On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <hans@confluent.io> wrote:
>
> > You don't even have to do that because the default partitioner will
> spread
> > the data you publish to the topic over the available partitions for you.
> > Just try it out to see. Publish multiple messages to the topic without
> > using keys, and without specifying a partition, and observe that they are
> > automatically distributed out over the available partitions.
> >
> >
> > //hans@confluent.io
> > -------- Original message --------From: Abhit Kalsotra <
> abhit011@gmail.com>
> > Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org Subject:
> > Re: Regarding Kafka
> > Hans
> >
> > Thanks for the response, yeah you can say yeah I am treating topics like
> > partitions, because my
> >
> > current logic of producing to a respective topic goes something like this
> >
> > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
> > kafkaTopic[whichTopic],
> >
> partition,
> >
> > RdKafka::Producer::RK_MSG_COPY,
> >                                                                
ptr,
> >                                                                
size,
> >
> > &partitionKey,
> >                                                                
NULL);
> > where partitionKey is unique number or userID, so what I am doing
> currently
> > each partitionKey%10
> > so whats so ever is the remainder, I am dumping that to the respective
> > topic.
> >
> > But as per your suggestion, Let me create close to 40-50 partitions for a
> > single topic and when i am producing I do something like this
> >
> > RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,
> >
> > partition%(50),
> >
> > RdKafka::Producer::RK_MSG_COPY,
> >                                                                
ptr,
> >                                                                
size,
> >
> > &partitionKey,
> >                                                                
NULL);
> >
> > Abhi
> >
> > On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <hans@confluent.io>
> wrote:
> >
> > > Why do you have 10 topics?  It seems like you are treating topics like
> > > partitions and it's unclear why you don't just have 1 topic with 10,
> 20,
> > or
> > > even 30 partitions. Ordering is only guaranteed at a partition level.
> > >
> > > In general if you want to capacity plan for partitions you benchmark a
> > > single partition and then divide your peak estimated throughput by the
> > > results of the single partition results.
> > >
> > > If you expect the peak throughput to increase over time then double
> your
> > > partition count to allow room to grow the number of consumers without
> > > having to repartition.
> > >
> > > Sizing can be a bit more tricky if you are using keys but it doesn't
> > sound
> > > like you are if today you are publishing to topics the way you
> describe.
> > >
> > > -hans
> > >
> > > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit011@gmail.com>
> wrote:
> > > >
> > > > Guys any views ?
> > > >
> > > > Abhi
> > > >
> > > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit011@gmail.com>
> > > wrote:
> > > >>
> > > >> Hello
> > > >>
> > > >> I am using librdkafka c++ library for my application .
> > > >>
> > > >> *My Kafka Cluster Set up*
> > > >> 2 Kafka Zookeper running on 2 different instances
> > > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other
> > machine
> > > >> Total 10 Topics and partition count is 3 with replication factor of
> 3.
> > > >>
> > > >> Now in my case I need to be very specific for the *message order*
> > when I
> > > >> am consuming the messages. I know if all the messages gets produced
> to
> > > the
> > > >> same partition, it always gets consumed in the same order.
> > > >>
> > > >> I need expert opinions like what's the ideal partition count I
> should
> > > >> consider without effecting performance.( I am looking for close to
> > > 100,000
> > > >> messages per seconds).
> > > >> The topics are from 0 to 9 and when I am producing messages I do
> > > something
> > > >> like uniqueUserId % 10 , and then pointing to a respective topic
> like
> > 0
> > > ||
> > > >> 1 || 2 etc..
> > > >>
> > > >> Abhi
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> If you can't succeed, call it version 1.0
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > If you can't succeed, call it version 1.0
> > >
> >
> >
> >
> > --
> > If you can't succeed, call it version 1.0
> >
>
>
>
> --
> If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message