kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Franz van Betteraey" <fvbetter...@web.de>
Subject Aw: Re: Kafka Mirror Maker place of execution
Date Tue, 12 Mar 2019 17:02:06 GMT
Hi Peter,

these are remarkable numbers but to be honest I do not get where you run the Mirror Maker
Do you run them near the remote clusters or near the target (core?) datacenter cluster?

As I understand you run 30 MirrorMaker Instances (one for each remote cluster) on each of
the 100 Kafka Nodes of your core datacenter cluster.
So you run the Mirror Maker on the same machine as the Kafka Nodes and do not use a dedicated
machines for the Mirror Maker process?

Best regards,

Gesendet: Dienstag, 12. März 2019 um 16:24 Uhr
Von: "Peter Bukowinski" <pmbuko@gmail.com>
An: users@kafka.apache.org
Betreff: Re: Kafka Mirror Maker place of execution
I have a setup with about 30 remote kafka clusters and one cluster in a core datacenter where
I aggregate data from all the remote clusters. The remote clusters have 30 nodes each with
moderate specs. The core cluster has 100 nodes with lots of cpu, ram, and ssd storage per

I run MirrorMaker directly on the core brokers. Each broker runs one MirrorMaker instance
per edge cluster, sharing the same group.id. Since I’m running 100 instances per edge cluster,
the number of threads I use = (total partition count of topics I am mirroring) / 100. In practice,
each MM instance runs with about 25 threads, so each broker runs 25*30=750 threads of MirrorMaker.

I’ve been running this setup for many months and it’s proved to be stable with very low
consumer lag.

Peter Bukowinski

> On Mar 12, 2019, at 6:42 AM, Ryanne Dolan <ryannedolan@gmail.com> wrote:
> Franz, you can run MM on or near either source or target cluster, but it's
> more efficient near the target because this minimizes producer latency. If
> latency is high, poducers will block waiting on ACKs for in-flight records,
> which reduces throughput.
> I recommend running MM near the target cluster but not necessarily on the
> same machines, because often Kafka nodes are relatively expensive, with SSD
> arrays and huge IO bandwidth etc, which isn't necessary for MM.
> Ryanne
> On Tue, Mar 12, 2019, 8:13 AM Franz van Betteraey <fvbetteraey@web.de>
> wrote:
>> Hi all,
>> there are best practices out there which recommend to run the Mirror Maker
>> on the target cluster.
>> https://community.hortonworks.com/articles/79891/kafka-mirror-maker-best-practices.html
>> I wonder why this recommendation exists because ultimately all data must
>> cross the border between the clusters, regardless of whether they are
>> consumed at the target or produced at the source. A reason I can imagine is
>> that the Mirror Maker supports multimple consumer but only one producer -
>> so consuming data on the way with the greater latency might be speed up by
>> the use of multiple consumers.
>> If performance because of multi threading is a point, would it be usefaul
>> to use several producer (one per consumer) to replicate the data (with a
>> custom replication process)? Does anyone knows why the Mirror Maker shares
>> a single producer among all consumers?
>> My usecase is the replication of data from several source cluster (~10) to
>> a single target cluster. I would prefer to run the replication process on
>> the source cluster to avoid to many replication processes (each for one
>> source) on the target cluster.
>> Hints and suggestions on this topic are very welcome.
>> Best regards
>> Franz
>> If you would like to earn some SO recommendation points feel free to
>> answer this question on SO ;-)
>> https://stackoverflow.com/q/55122268/367285[https://stackoverflow.com/q/55122268/367285]

View raw message