kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Bukowinski <pmb...@gmail.com>
Subject Re: Kafka Mirror Maker place of execution
Date Tue, 12 Mar 2019 15:24:11 GMT
I have a setup with about 30 remote kafka clusters and one cluster in a core datacenter where
I aggregate data from all the remote clusters. The remote clusters have 30 nodes each with
moderate specs. The core cluster has 100 nodes with lots of cpu, ram, and ssd storage per

I run MirrorMaker directly on the core brokers. Each broker runs one MirrorMaker instance
per edge cluster, sharing the same group.id. Since I’m running 100 instances per edge cluster,
the number of threads I use = (total partition count of topics I am mirroring) / 100. In practice,
each MM instance runs with about 25 threads, so each broker runs 25*30=750 threads of MirrorMaker.

I’ve been running this setup for many months and it’s proved to be stable with very low
consumer lag.

Peter Bukowinski

> On Mar 12, 2019, at 6:42 AM, Ryanne Dolan <ryannedolan@gmail.com> wrote:
> Franz, you can run MM on or near either source or target cluster, but it's
> more efficient near the target because this minimizes producer latency. If
> latency is high, poducers will block waiting on ACKs for in-flight records,
> which reduces throughput.
> I recommend running MM near the target cluster but not necessarily on the
> same machines, because often Kafka nodes are relatively expensive, with SSD
> arrays and huge IO bandwidth etc, which isn't necessary for MM.
> Ryanne
> On Tue, Mar 12, 2019, 8:13 AM Franz van Betteraey <fvbetteraey@web.de>
> wrote:
>> Hi all,
>> there are best practices out there which recommend to run the Mirror Maker
>> on the target cluster.
>> https://community.hortonworks.com/articles/79891/kafka-mirror-maker-best-practices.html
>> I wonder why this recommendation exists because ultimately all data must
>> cross the border between the clusters, regardless of whether they are
>> consumed at the target or produced at the source. A reason I can imagine is
>> that the Mirror Maker supports multimple consumer but only one producer -
>> so consuming data on the way with the greater latency might be speed up by
>> the use of multiple consumers.
>> If performance because of multi threading is a point, would it be usefaul
>> to use several producer (one per consumer) to replicate the data (with a
>> custom replication process)? Does anyone knows why the Mirror Maker shares
>> a single producer among all consumers?
>> My usecase is the replication of data from several source cluster (~10) to
>> a single target cluster. I would prefer to run the replication process on
>> the source cluster to avoid to many replication processes (each for one
>> source) on the target cluster.
>> Hints and suggestions on this topic are very welcome.
>> Best regards
>>  Franz
>> If you would like to earn some SO recommendation points feel free to
>> answer this question on SO ;-)
>> https://stackoverflow.com/q/55122268/367285

View raw message