kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: producer behavior when network is down
Date Tue, 13 Aug 2013 04:02:50 GMT
In Kafka, we detect failures using ZK. So, if the network connectivity btw
the producer and the broker is down, but the one btw the broker and ZK is
up, we assume the broker is still alive and will continue to send the data
to it. Within the the same data center, we assume this is extremely rare.
If the network connectivity btw the broker and ZK is also down, the
producer will be able to automatically detect the failure and send data to
other brokers.

In 0.7, there is no way to check whether a message is really sent or not,
since the producer requests don't receive any acknowledgement. This is
changed in 0.8, where a producer can choose when to receive an ack (see
request.required.acks in
http://kafka.apache.org/documentation.html#producerconfigs).

Thanks,

Jun


On Fri, Jul 26, 2013 at 9:27 AM, Viktor Kolodrevskiy <
viktor.kolodrevskiy@gmail.com> wrote:

> Hey guys,
>
> We decided to use Kafka in our new project, now I spend some time to
> research how Kafka producer behaves while network connectivity
> problems.
>
> I had 3 virtual machines(ubuntu 13.04, running on Virtualbox) in one
> network:
>
> 1. Kafka server(0.7.2) + Zookeper.
> 2. Producer app with default settings.
> 3. Consumer app.
>
> Results of the following tests with default sync producer settings:
>
> 1. Condition: Put network down on machine (1) for 20 mins.
> Result: Producer is working for ~16mins. Consumer does not receive
> anything.
> After ~16mins Producer app fails(with java.io.IOException: Connection
> timed out). Consumer app does not fail.
> Messages that were generated during 16mins are lost!
>
> 2. Condition: Put network down on machine (1) for 5 mins and after 5
> mins start network on (1) again.
> Result: Producer app is working, no exceptions or notification that
> network was down.
> Consumer does not receive messages for 5 mins. But when network on (1)
> is up it receives all messages.
> There are no messages lost.
>
> 3. Condition: put network down on machine (2) for 20 mins.
> Result: Producer is working for ~16mins. Consumer does not receive
> anything.
> After ~16mins Producer app fails(with java.io.IOException: Connection
> timed out). Consumer app does not fail.
> Messages that were generated during 16mins are lost! (Same result as in
> test#1)
> Kafka and Zookeeper logs that producer is disconnected.
>
> 4. Condition: Put network down on machine (2) for 5 mins and after 5
> mins start network on (2) again.
> Result: Producer app is working, no exceptions or notification that
> network was down.
> Consumer does not receive messages for 5 mins. But when network on (2)
> is up it receives all messages.(Same result as in test#2)
> Kafka and Zookeeper logs that producer is disconnected.
>
> 5. Condition: Kill Kafka server(0.7.2) + Zookeper(kill application, do
> not shutdown network).
> Result: Producer fails in a few seconds with
> "kafka.common.NoBrokersForPartitionException: Partition = null"
> Consumer is still working even after 25 minutes.
>
> One more interesting thing. Changing connect.timeout.ms parameter
> value for producer
> did not change 16 mins that I have.
>
> Played with settings and find out the only way to reduce time for
> producer to find out that network is down is to change one of two
> parameters: reconnect.interval, reconnect.time.interval.ms
>
> So lets say we change reconnect.time.interval.ms=1000.
> This means that producer will do reconnect to kafka every 1 second.
> In this case producer find out that network is down in 1 second.
> Producer stops sending messages and throw "java.net.ConnectException:
> Connection timed out". This is the only way that I found out so far.
> In this case we do not loose too much messages but performance may suffer.
> Or we can set reconnect.interval=1 so reconnect will happen after each
> message sent
> and do not loose messages at all.
>
> Then I did testing for Async producer(producer.type=async)
> The results are dramatic for me, coz producer does not throw any exception.
> It sends messages and does not fall.
> I left it running for night and it did not fall though network between
> kafka server and producer app was down.
> Playing with async producer config parameters did not help also.
>
> My questions are:
>
> 1. Where may these 16 mins come from?
> 2. Are there any best practices to handle network down issues?
> 3. Why async producer never throws exceptions when network is down?
> 4. What is the way to check from sync/async producer that messages
> were really sent?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message