kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From adrien ruffie <adriennolar...@hotmail.fr>
Subject RE: Delayed processing
Date Thu, 08 Mar 2018 21:58:00 GMT
Hello Wim,


does it matter (I think), because one of the big and principal features of Kafka is:

Kafka is to do load balancing of messages and guarantee ordering in a distributed cluster.


The order of the messages should be guaranteed, unless several cases:

1] Producer can cause data loss when, block.on.buffer.full = false, retries are exhausted
and sending message without using acks=all

2] unclean leader election enable: because if one of follower (out of sync) become the new
leader, messages that were not synced to the new

leader are lost.


message reordering might happen when:

1] max.in.flight.requests.per.connection > 1 and retries are enabled

2] when a producer is not correclty closed like, without calling .close()

Because close method allowing to ensure that accumulator is closed first to guarantee that
no more appends are accepted after breaking the send loop.



If you wan't to avoir these cases:

- close producer in the callback error

- close producer with close(0) to prevent sending after previous message send failed


Avoid data loss:

- block.on.buffer.fill=TRUE

- retries=Long.MAX_VALUE

- acks=all


Avoid reordering:

max.in.flight.request.per.connection=1 (be aware about latency)


take attention about, if your producer is down, messages in buffer will still be lost ...
(perhaps manage a local storage if you are punctilious)

moreover at least two replicas are nedded at any time to guarantee data persistence. example
replication factor = 3, min.isr = 2 , unclean leader election disabled


Also keep in mind that consumer can lose message when offsets are not correctly commited.
Disable auto.offset.commit and commit offsets only after make your job for each message (or
commit several processed messages at one time and kept in a local memory buffer)


I hope, these previous suggestions help you 😊


Best regards,

Adrien

________________________________
De : Wim Van Leuven <wim.vanleuven@highestpoint.biz>
Envoyé : jeudi 8 mars 2018 21:35:13
À : users@kafka.apache.org
Objet : Delayed processing

Hello,

I'm wondering how to design a KStreams or regular Kafka application that
can hold of processing of messages until a future time.

This related to EU's data protection regulation: we can store raw messages
for a given time; afterwards we have to store the anonymised message. So, I
was thinking about branching the stream, anonymise the messages into a
waiting topic and than continue from there until the retention time passes.

But that approach has some caveats:

   - This is not an exact solution as order of events is not guaranteed: we
   might encounter a message that triggers the stop processing while some
   events arriving later should normally still pass
   - how to stop properly stop processing if we encounter a message that
   indicates to not continue?
   - ...

Are there better know solutions or best practices to delay message
processing with Kafka streams / consumers+producers?

Thanks for any insights/help here!
-wim
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message