kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Lerche ...@carllerche.com>
Subject Re: Surprisingly high network traffic between kafka servers
Date Thu, 06 Feb 2014 00:51:16 GMT
So, I tried enabling debug logging, I also made some tweaks to the
config (which I probably shouldn't have) and craziness happened.

First, some more context. Besides the very high network traffic, we
were seeing some other issues that we were not focusing on yet.

* Even though the log retention was set to 50GB & 24 hours, data logs
were getting cleaned up far quicker quicker. I'm not entirely sure how
much quicker, but there was definitely far less than 12 hours and 1GB
of data.

* Kafka was not properly balanced. We had 3 servers, and only 2 of
them were partition leaders. One server was a replica for all
partitions. We tried to run a rebalance command, but it did not work.
We were going to investigate later.

So, after restarting all the kafkas, something happened with the
offsets. The offsets that our consumers had no longer existed. It
looks like somehow all the contents was lost? The logs show many
exceptions like:

`Request for offset 770354 but we only have log segments in the range
759234 to 759838.`

So, I reset all the consumer offsets to the head of the queue as I did
not know of anything better to do. Once the dust settled, all the
issues we were seeing vanished. Communication between Kafka nodes
appear to be normal, Kafka was able to rebalance, and hopefully log
retention will be normal.

I am unsure what happened or how to get more debug information.

On Wed, Feb 5, 2014 at 12:31 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
> Can you enable DEBUG logging in log4j and see what requests are coming in?
>
> -Jay
>
>
> On Tue, Feb 4, 2014 at 9:51 PM, Carl Lerche <me@carllerche.com> wrote:
>
>> Hi Jay,
>>
>> I do not believe that I have changed the replica.fetch.wait.max.ms
>> setting. Here I have included the kafka config as well as a snapshot
>> of jnettop from one of the servers.
>>
>> https://gist.github.com/carllerche/4f2cf0f0f6d1e891f482
>>
>> The bottom row (89.9K/s) is the producer (it lives on a Kafka server).
>> The top two rows are Kafkas on other servers, you can see the combined
>> throughput is ~80MB/s
>>
>> On Tue, Feb 4, 2014 at 9:36 PM, Jay Kreps <jay.kreps@gmail.com> wrote:
>> > No this is not normal.
>> >
>> > Checking twice a second (using 500ms default) for new data shouldn't
>> cause
>> > high network traffic (that should be like < 1KB of overhead). I don't
>> think
>> > that explains things. Is it possible that setting has been overridden?
>> >
>> > -Jay
>> >
>> >
>> > On Tue, Feb 4, 2014 at 9:25 PM, Guozhang Wang <wangguoz@gmail.com>
>> wrote:
>> >
>> >> Hi Carl,
>> >>
>> >> For each partition the follower will also fetch data from the leader
>> >> replica, even if there is no new data in the leader replicas.
>> >>
>> >> One thing you can try to increase replica.fetch.wait.max.ms (default
>> value
>> >> 500ms) so that the followers's fetching request frequency to the leader
>> can
>> >> be reduced, and see if that has some effect on the traffic.
>> >>
>> >> Guozhang
>> >>
>> >>
>> >> On Tue, Feb 4, 2014 at 8:46 PM, Carl Lerche <me@carllerche.com> wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > I'm running a 0.8.0 Kafka cluster of 3 servers. The service that it
is
>> >> > for is not in full production yet, so the data written to cluster is
>> >> > minimal (seems to average between 100kb/s -> 300kb/s per server).
I
>> >> > have configured Kafka to have a 3 replicas. I am noticing that each
>> >> > Kafka server is talking to all the others at a data rate of 40MB/s
for
>> >> > each server (so, a total of 80MB/s for each server). This
>> >> > communication is constant.
>> >> >
>> >> > Is this normal? This seems like very strange behavior and I'm not
>> >> > exactly sure how to debug.
>> >> >
>> >> > Thanks,
>> >> > Carl
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> -- Guozhang
>> >>
>>

Mime
View raw message