kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niek Sanders <niek.sand...@gmail.com>
Subject Re: Embedding a broker into a producer?
Date Thu, 12 Apr 2012 16:33:31 GMT
Dealing with network/broker outage on the producer side is also
something that I've been trying to solve.

Having a hook for the producer to dump to a local file would probably
be the simplest solution.  In the event of a prolonged outage, this
file could be replayed once availability is restored.

The current approach I've been taking:
1) My bridge code between my data source and the Kafka producer writes
everything to a local log files.  When this bridge starts up, it
generates a unique 8 character alphanumeric string.  For each log
entry it writes to the local file, it prefixes both the alphanumeric
string and a log line number (0,1,2,3,....).  The data already has
timestamps coming with it.
2) In the event of a network outage or Kafka being unable to keep up
with the producer, I simply drop the Kafka messages.  I never allow my
data source to be blocked because I'm waiting on Kafka
producer/broker.
3) For given time ranges, my consumers track all the alphanumeric
identifiers that they consumed and the maximum complete sequence
number that they have seen.

So I can manually go back to producers and replay any lost data.
(Whether it was never sent because of network outage or if it died
with a broker hardware failure).

I basically go to the producer machine (which I track in the Kafka
message body) and say: for time A to time B, I received data for these
identifiers and max sequence numbers (najeh2wh, 12312), (ji3njdKL,
71).  Replay anything that I'm missing.

I use random identifier strings because it saves me from having to
persist the number of log lines my producer has generated.
(Robustness against producer failure).

- Niek







On Thu, Apr 12, 2012 at 7:12 AM, Edward Smith <esmith@stardotstar.org> wrote:
> Jun/Eric,
>
> [snip]
>
>  However, we have a requirement to support HA.  If I stick with the
> approach above, I have to worry about replication/mirroring the
> queues, which always gets sticky.   We have to handle the case where a
> producer loses network connectivity, and so, must be able to queue
> locally at the producer, which, I believe either means put the KAFKA
> broker here or continue to use some 'homebrew'  local queue.  With
> brokers on the same node as producers, consumers only have to HA the
> results of their processing and I don't have to HA the queues.
>
>  Any thoughts or feedback from the group is welcome.
>
> Ed
>

Mime
View raw message