kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Akhtar <ali.rac...@gmail.com>
Subject Re: Tracking when a batch of messages has arrived?
Date Sun, 04 Dec 2016 16:35:51 GMT
I don't - it would require fetching all messages and iterating over them
just to count them, which is expensive. I know the counts after they have
been sent.

On Sun, Dec 4, 2016 at 9:34 PM, Marko Bonaći <marko.bonaci@sematext.com>
wrote:

> Do you know in advance (when sending the first message) how many messages
> that batch is going to have?
>
>
> Marko Bonaći
> Monitoring | Alerting | Anomaly Detection | Centralized Log Management
> Solr & Elasticsearch Support
> Sematext <http://sematext.com/> | Contact
> <http://sematext.com/about/contact.html>
>
> On Sat, Dec 3, 2016 at 1:01 AM, Ali Akhtar <ali.rac200@gmail.com> wrote:
>
> > Hey Apurva,
> >
> > I am including the batch_id inside the messages.
> >
> > Could you give me an example of what you mean by custom control messages
> > with a control topic please?
> >
> >
> >
> > On Sat, Dec 3, 2016 at 12:35 AM, Apurva Mehta <apurva@confluent.io>
> wrote:
> >
> > > That should work, though it sounds like you may be interested in :
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging
> > >
> > > If you can include the 'batch_id' inside your messages, and define
> custom
> > > control messages with a control topic, then you would not need one
> topic
> > > per batch, and you would be very close to the essence of the above
> > > proposal.
> > >
> > > Thanks,
> > > Apurva
> > >
> > > On Fri, Dec 2, 2016 at 5:02 AM, Ali Akhtar <ali.rac200@gmail.com>
> wrote:
> > >
> > > > Heya,
> > > >
> > > > I need to send a group of messages, which are all related, and then
> > > process
> > > > those messages, only when all of them have arrived.
> > > >
> > > > Here is how I'm planning to do this. Is this the right way, and can
> any
> > > > improvements be made to this?
> > > >
> > > > 1) Send a message to a topic called batch_start, with a batch id
> (which
> > > > will be a UUID)
> > > >
> > > > 2) Post the messages to a topic called batch_msgs_<batch_id>. Here
> > > batch_id
> > > > will be the batch id sent in batch_start.
> > > >
> > > > The number of messages sent will be recorded by the producer.
> > > >
> > > > 3) Send a message to batch_end with the batch id and the number of
> sent
> > > > messages.
> > > >
> > > > 4) On the consumer side, using Kafka Streaming, I would listen to
> > > > batch_end.
> > > >
> > > > 5) When the message there arrives, I will start another instance of
> > Kafka
> > > > Streaming, which will process the messages in batch_msgs_<batch_id>
> > > >
> > > > 6) Perhaps to be extra safe, whenever batch_end arrives, I will
> start a
> > > > throwaway consumer which will just count the number of messages in
> > > > batch_msgs_<batch_id>. If these don't match the # of messages
> specified
> > > in
> > > > the batch_end message, then it will assume that the batch hasn't yet
> > > > finished arriving, and it will wait for some time before retrying.
> Once
> > > the
> > > > correct # of messages have arrived, THEN it will trigger step 5
> above.
> > > >
> > > > Will the above method work, or should I make any changes to it?
> > > >
> > > > Is step 6 necessary?
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message