kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Huus <evan.h...@shopify.com>
Subject Re: Fetch Request Purgatory and Mirrormaker
Date Thu, 23 Apr 2015 19:14:22 GMT
This is still occurring for us. In addition, it has started occurring on
one of the six nodes in the "healthy" cluster, for no reason we have been
able to determine.

We're willing to put in some serious time to help debug/solve this, but we
need *some* hint as to where to start. I understand that purgatory has been
rewritten (again) in 0.8.3, so might it be worth trying a trunk build? Is
there an ETA for a beta release of 0.8.3?

Thanks,
Evan

On Tue, Apr 14, 2015 at 8:40 PM, Evan Huus <evan.huus@shopify.com> wrote:

> On Tue, Apr 14, 2015 at 8:31 PM, Jiangjie Qin <jqin@linkedin.com.invalid>
> wrote:
>
>> Hey Evan,
>>
>> Is this issue only observed when mirror maker is consuming? It looks that
>> for Cluster A you have some other consumers.
>> Do you mean if you stop mirror maker the problem goes away?
>>
>
> Yes, exactly. The setup is A -> Mirrormaker -> B so mirrormaker is
> consuming from A and producing to B.
>
> Cluster A is always fine. Cluster B is fine when mirrormaker is stopped.
> Cluster B has the weird purgatory issue when mirrormaker is running.
>
> Today I rolled out a change to reduce the
> `fetch.purgatory.purge.interval.requests` and
> `producer.purgatory.purge.interval.requests` configuration values on
> cluster B from 1000 to 200, but it had no effect, which I find really weird.
>
> Thanks,
> Evan
>
>
>> Jiangjie (Becket) Qin
>>
>> On 4/14/15, 6:55 AM, "Evan Huus" <evan.huus@shopify.com> wrote:
>>
>> >Any ideas on this? It's still occurring...
>> >
>> >Is there a separate mailing list or project for mirrormaker that I could
>> >ask?
>> >
>> >Thanks,
>> >Evan
>> >
>> >On Thu, Apr 9, 2015 at 4:36 PM, Evan Huus <evan.huus@shopify.com> wrote:
>> >
>> >> Hey Folks, we're running into an odd issue with mirrormaker and the
>> >>fetch
>> >> request purgatory on the brokers. Our setup consists of two six-node
>> >> clusters (all running 0.8.2.1 on identical hw with the same config).
>> All
>> >> "normal" producing and consuming happens on cluster A. Mirrormaker has
>> >>been
>> >> set up to copy all topics (except a tiny blacklist) from cluster A to
>> >> cluster B.
>> >>
>> >> Cluster A is completely healthy at the moment. Cluster B is not, which
>> >>is
>> >> very odd since it is literally handling the exact same traffic.
>> >>
>> >> The graph for Fetch Request Purgatory Size looks like this:
>> >>
>> >>
>> https://www.dropbox.com/s/k87wyhzo40h8gnk/Screenshot%202015-04-09%2016.08
>> >>.37.png?dl=0
>> >>
>> >> Every time the purgatory shrinks, the latency from that causes one or
>> >>more
>> >> nodes to drop their leadership (it quickly recovers). We could probably
>> >> alleviate the symptoms by decreasing
>> >> `fetch.purgatory.purge.interval.requests` (it is currently at the
>> >>default
>> >> value) but I'd rather try and understand/solve the root cause here.
>> >>
>> >> Cluster B is handling no outside fetch requests, and turning
>> mirrormaker
>> >> off "fixes" the problem, so clearly (since mirrormaker is producing to
>> >>this
>> >> cluster not consuming from it) the fetch requests must be coming from
>> >> internal replication. However, the same data is being replicated when
>> >>it is
>> >> originally produced in cluster A, and the fetch purgatory size sits
>> >>stably
>> >> at ~10k there. There is nothing unusual in the logs on either cluster.
>> >>
>> >> I have read all the wiki pages and jira tickets I can find about the
>> new
>> >> purgatory design in 0.8.2 but nothing stands out as applicable. I'm
>> >>happy
>> >> to provide more detailed logs, configuration, etc. if anyone thinks
>> >>there
>> >> might be something important in there. I am completely baffled as to
>> >>what
>> >> could be causing this.
>> >>
>> >> Any suggestions would be appreciated. I'm starting to think at this
>> >>point
>> >> that we've completely misunderstood or misconfigured *something*.
>> >>
>> >> Thanks,
>> >> Evan
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message