kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niek Sanders <niek.sand...@gmail.com>
Subject Re: Embedding a broker into a producer?
Date Thu, 12 Apr 2012 16:33:31 GMT
Dealing with network/broker outage on the producer side is also
something that I've been trying to solve.

Having a hook for the producer to dump to a local file would probably
be the simplest solution.  In the event of a prolonged outage, this
file could be replayed once availability is restored.

The current approach I've been taking:
1) My bridge code between my data source and the Kafka producer writes
everything to a local log files.  When this bridge starts up, it
generates a unique 8 character alphanumeric string.  For each log
entry it writes to the local file, it prefixes both the alphanumeric
string and a log line number (0,1,2,3,....).  The data already has
timestamps coming with it.
2) In the event of a network outage or Kafka being unable to keep up
with the producer, I simply drop the Kafka messages.  I never allow my
data source to be blocked because I'm waiting on Kafka
3) For given time ranges, my consumers track all the alphanumeric
identifiers that they consumed and the maximum complete sequence
number that they have seen.

So I can manually go back to producers and replay any lost data.
(Whether it was never sent because of network outage or if it died
with a broker hardware failure).

I basically go to the producer machine (which I track in the Kafka
message body) and say: for time A to time B, I received data for these
identifiers and max sequence numbers (najeh2wh, 12312), (ji3njdKL,
71).  Replay anything that I'm missing.

I use random identifier strings because it saves me from having to
persist the number of log lines my producer has generated.
(Robustness against producer failure).

- Niek

On Thu, Apr 12, 2012 at 7:12 AM, Edward Smith <esmith@stardotstar.org> wrote:
> Jun/Eric,
> [snip]
>  However, we have a requirement to support HA.  If I stick with the
> approach above, I have to worry about replication/mirroring the
> queues, which always gets sticky.   We have to handle the case where a
> producer loses network connectivity, and so, must be able to queue
> locally at the producer, which, I believe either means put the KAFKA
> broker here or continue to use some 'homebrew'  local queue.  With
> brokers on the same node as producers, consumers only have to HA the
> results of their processing and I don't have to HA the queues.
>  Any thoughts or feedback from the group is welcome.
> Ed

View raw message