kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Bonaći <marko.bon...@sematext.com>
Subject Re: Tracking when a batch of messages has arrived?
Date Sun, 04 Dec 2016 16:34:00 GMT
Do you know in advance (when sending the first message) how many messages
that batch is going to have?


Marko Bonaći
Monitoring | Alerting | Anomaly Detection | Centralized Log Management
Solr & Elasticsearch Support
Sematext <http://sematext.com/> | Contact
<http://sematext.com/about/contact.html>

On Sat, Dec 3, 2016 at 1:01 AM, Ali Akhtar <ali.rac200@gmail.com> wrote:

> Hey Apurva,
>
> I am including the batch_id inside the messages.
>
> Could you give me an example of what you mean by custom control messages
> with a control topic please?
>
>
>
> On Sat, Dec 3, 2016 at 12:35 AM, Apurva Mehta <apurva@confluent.io> wrote:
>
> > That should work, though it sounds like you may be interested in :
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging
> >
> > If you can include the 'batch_id' inside your messages, and define custom
> > control messages with a control topic, then you would not need one topic
> > per batch, and you would be very close to the essence of the above
> > proposal.
> >
> > Thanks,
> > Apurva
> >
> > On Fri, Dec 2, 2016 at 5:02 AM, Ali Akhtar <ali.rac200@gmail.com> wrote:
> >
> > > Heya,
> > >
> > > I need to send a group of messages, which are all related, and then
> > process
> > > those messages, only when all of them have arrived.
> > >
> > > Here is how I'm planning to do this. Is this the right way, and can any
> > > improvements be made to this?
> > >
> > > 1) Send a message to a topic called batch_start, with a batch id (which
> > > will be a UUID)
> > >
> > > 2) Post the messages to a topic called batch_msgs_<batch_id>. Here
> > batch_id
> > > will be the batch id sent in batch_start.
> > >
> > > The number of messages sent will be recorded by the producer.
> > >
> > > 3) Send a message to batch_end with the batch id and the number of sent
> > > messages.
> > >
> > > 4) On the consumer side, using Kafka Streaming, I would listen to
> > > batch_end.
> > >
> > > 5) When the message there arrives, I will start another instance of
> Kafka
> > > Streaming, which will process the messages in batch_msgs_<batch_id>
> > >
> > > 6) Perhaps to be extra safe, whenever batch_end arrives, I will start a
> > > throwaway consumer which will just count the number of messages in
> > > batch_msgs_<batch_id>. If these don't match the # of messages specified
> > in
> > > the batch_end message, then it will assume that the batch hasn't yet
> > > finished arriving, and it will wait for some time before retrying. Once
> > the
> > > correct # of messages have arrived, THEN it will trigger step 5 above.
> > >
> > > Will the above method work, or should I make any changes to it?
> > >
> > > Is step 6 necessary?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message