kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhit Kalsotra <abhit...@gmail.com>
Subject Re: Regarding Kafka
Date Sun, 09 Oct 2016 07:39:15 GMT
What about the order of message getting received ? If i don't mention the
partition.

Lets say if i have user ID :4456 and I have to do some analytics at the
Kafka Consumer end and at my consumer end if its not getting consumed the
way I sent, then my analytics will go haywire.

Abhi

On Sun, Oct 9, 2016 at 12:50 PM, Hans Jespersen <hans@confluent.io> wrote:

> You don't even have to do that because the default partitioner will spread
> the data you publish to the topic over the available partitions for you.
> Just try it out to see. Publish multiple messages to the topic without
> using keys, and without specifying a partition, and observe that they are
> automatically distributed out over the available partitions.
>
>
> //hans@confluent.io
> -------- Original message --------From: Abhit Kalsotra <abhit011@gmail.com>
> Date: 10/8/16  11:19 PM  (GMT-08:00) To: users@kafka.apache.org Subject:
> Re: Regarding Kafka
> Hans
>
> Thanks for the response, yeah you can say yeah I am treating topics like
> partitions, because my
>
> current logic of producing to a respective topic goes something like this
>
> RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_
> kafkaTopic[whichTopic],
>                                                                 partition,
>
> RdKafka::Producer::RK_MSG_COPY,
>                                                                 ptr,
>                                                                 size,
>
> &partitionKey,
>                                                                 NULL);
> where partitionKey is unique number or userID, so what I am doing currently
> each partitionKey%10
> so whats so ever is the remainder, I am dumping that to the respective
> topic.
>
> But as per your suggestion, Let me create close to 40-50 partitions for a
> single topic and when i am producing I do something like this
>
> RdKafka::ErrorCode resp = m_kafkaProducer->produce(m_kafkaTopic,
>
> partition%(50),
>
> RdKafka::Producer::RK_MSG_COPY,
>                                                                 ptr,
>                                                                 size,
>
> &partitionKey,
>                                                                 NULL);
>
> Abhi
>
> On Sun, Oct 9, 2016 at 10:13 AM, Hans Jespersen <hans@confluent.io> wrote:
>
> > Why do you have 10 topics?  It seems like you are treating topics like
> > partitions and it's unclear why you don't just have 1 topic with 10, 20,
> or
> > even 30 partitions. Ordering is only guaranteed at a partition level.
> >
> > In general if you want to capacity plan for partitions you benchmark a
> > single partition and then divide your peak estimated throughput by the
> > results of the single partition results.
> >
> > If you expect the peak throughput to increase over time then double your
> > partition count to allow room to grow the number of consumers without
> > having to repartition.
> >
> > Sizing can be a bit more tricky if you are using keys but it doesn't
> sound
> > like you are if today you are publishing to topics the way you describe.
> >
> > -hans
> >
> > > On Oct 8, 2016, at 9:01 PM, Abhit Kalsotra <abhit011@gmail.com> wrote:
> > >
> > > Guys any views ?
> > >
> > > Abhi
> > >
> > >> On Sat, Oct 8, 2016 at 4:28 PM, Abhit Kalsotra <abhit011@gmail.com>
> > wrote:
> > >>
> > >> Hello
> > >>
> > >> I am using librdkafka c++ library for my application .
> > >>
> > >> *My Kafka Cluster Set up*
> > >> 2 Kafka Zookeper running on 2 different instances
> > >> 7 Kafka Brokers , 4 Running on 1 machine and 3 running on other
> machine
> > >> Total 10 Topics and partition count is 3 with replication factor of 3.
> > >>
> > >> Now in my case I need to be very specific for the *message order*
> when I
> > >> am consuming the messages. I know if all the messages gets produced to
> > the
> > >> same partition, it always gets consumed in the same order.
> > >>
> > >> I need expert opinions like what's the ideal partition count I should
> > >> consider without effecting performance.( I am looking for close to
> > 100,000
> > >> messages per seconds).
> > >> The topics are from 0 to 9 and when I am producing messages I do
> > something
> > >> like uniqueUserId % 10 , and then pointing to a respective topic like
> 0
> > ||
> > >> 1 || 2 etc..
> > >>
> > >> Abhi
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> If you can't succeed, call it version 1.0
> > >>
> > >
> > >
> > >
> > > --
> > > If you can't succeed, call it version 1.0
> >
>
>
>
> --
> If you can't succeed, call it version 1.0
>



-- 
If you can't succeed, call it version 1.0

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message