kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiangjie Qin <j...@linkedin.com.INVALID>
Subject Re: Fetch Request Purgatory and Mirrormaker
Date Wed, 15 Apr 2015 00:31:08 GMT
Hey Evan,

Is this issue only observed when mirror maker is consuming? It looks that
for Cluster A you have some other consumers.
Do you mean if you stop mirror maker the problem goes away?

Jiangjie (Becket) Qin

On 4/14/15, 6:55 AM, "Evan Huus" <evan.huus@shopify.com> wrote:

>Any ideas on this? It's still occurring...
>Is there a separate mailing list or project for mirrormaker that I could
>On Thu, Apr 9, 2015 at 4:36 PM, Evan Huus <evan.huus@shopify.com> wrote:
>> Hey Folks, we're running into an odd issue with mirrormaker and the
>> request purgatory on the brokers. Our setup consists of two six-node
>> clusters (all running on identical hw with the same config). All
>> "normal" producing and consuming happens on cluster A. Mirrormaker has
>> set up to copy all topics (except a tiny blacklist) from cluster A to
>> cluster B.
>> Cluster A is completely healthy at the moment. Cluster B is not, which
>> very odd since it is literally handling the exact same traffic.
>> The graph for Fetch Request Purgatory Size looks like this:
>> Every time the purgatory shrinks, the latency from that causes one or
>> nodes to drop their leadership (it quickly recovers). We could probably
>> alleviate the symptoms by decreasing
>> `fetch.purgatory.purge.interval.requests` (it is currently at the
>> value) but I'd rather try and understand/solve the root cause here.
>> Cluster B is handling no outside fetch requests, and turning mirrormaker
>> off "fixes" the problem, so clearly (since mirrormaker is producing to
>> cluster not consuming from it) the fetch requests must be coming from
>> internal replication. However, the same data is being replicated when
>>it is
>> originally produced in cluster A, and the fetch purgatory size sits
>> at ~10k there. There is nothing unusual in the logs on either cluster.
>> I have read all the wiki pages and jira tickets I can find about the new
>> purgatory design in 0.8.2 but nothing stands out as applicable. I'm
>> to provide more detailed logs, configuration, etc. if anyone thinks
>> might be something important in there. I am completely baffled as to
>> could be causing this.
>> Any suggestions would be appreciated. I'm starting to think at this
>> that we've completely misunderstood or misconfigured *something*.
>> Thanks,
>> Evan

View raw message