kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: OutOfMemoryError in mirror maker
Date Sun, 21 Jun 2015 06:20:29 GMT
Hi Tao / Jiangjie,

I think a better fix here may be not letting MirrorMakerProducerCallback to
extend from ErrorLoggingCallback, but rather change the
ErrorLoggingCallback itself as it defeats the usage of logAsString, which I
think is useful for a general error logging purposes. Rather we can
let MirrorMakerProducerCallback
to not take the value bytes itself but just the length if people agree that
for MM we probably are not interested in its message value in callback.
Thoughts?

Guozhang

On Wed, Jun 17, 2015 at 1:06 AM, tao xiao <xiaotao183@gmail.com> wrote:

> Thank you for the reply.
>
> Patch submitted https://issues.apache.org/jira/browse/KAFKA-2281
>
> On Mon, 15 Jun 2015 at 02:16 Jiangjie Qin <jqin@linkedin.com.invalid>
> wrote:
>
> > Hi Tao,
> >
> > Yes, the issue that ErrorLoggingCallback keeps value as local variable is
> > known for a while and we probably should fix it as the value is not used
> > except logging the its size. Can you open a ticket and maybe also submit
> a
> > patch?
> >
> > For unreachable objects I donĀ¹t think it is memory leak. As you said, GC
> > should take care of this. In LinkedIn we are using G1GC with some tunings
> > made by our SRE. You can try that if interested.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On 6/13/15, 11:39 AM, "tao xiao" <xiaotao183@gmail.com> wrote:
> >
> > >Hi,
> > >
> > >I am using mirror maker in trunk to replica data across two data
> centers.
> > >While the destination broker was having busy load and unresponsive the
> > >send
> > >rate of mirror maker was very low and the available producer buffer was
> > >quickly filled up. At the end mirror maker threw OOME. Detailed
> exception
> > >can be found here
> > >
> >
> https://gist.github.com/xiaotao183/53e1bf191c1a4d030a25#file-oome-exceptio
> > >n-L1
> > >
> > >I started up mirror maker with 1G memory and 256M producer buffer. I
> used
> > >eclipse MAT to analyze the heap dump and found out the retained heap
> size
> > >of all RecordBatch objects were more than 500MB half of which were used
> to
> > >retain data that were to send to destination broker which makes sense to
> > >me
> > >as it is close to 256MB producer buffer but the other half of which were
> > >used by kafka.tools.MirrorMaker$MirrorMakerProducerCallback. As every
> > >producer callback in mirror maker takes the message value and hold it
> > >until
> > >the message is successfully delivered. In my case since the destination
> > >broker was very unresponsive the message value held by callback would
> stay
> > >forever which I think is a waste and it is a major contributor to the
> OOME
> > >issue. screenshot of MAT
> > >
> >
> https://gist.github.com/xiaotao183/53e1bf191c1a4d030a25#file-mat-screensho
> > >t-png
> > >
> > >The other interesting problem I observed is that when I turned on
> > >unreachable object parsing in MAT more than 400MB memory was occupied by
> > >unreachable objects. It surprised me that gc didn't clean them up before
> > >OOME was thrown. As suggested in gc log
> > >
> >
> https://gist.github.com/xiaotao183/53e1bf191c1a4d030a25#file-oome-gc-log-L
> > >1
> > >Full GC was unable to reclaim any memory and when facing OOME these
> > >unreachable objects should have been cleaned up. so either eclipse MAT
> has
> > >issue parsing the heap dump or there is hidden memory leak that is hard
> to
> > >find. I attached the sample screenshot of the unreachable objects here
> > >
> >
> https://gist.github.com/xiaotao183/53e1bf191c1a4d030a25#file-unreachable-o
> > >bjects-png
> > >
> > >The consumer properties
> > >
> > >zookeeper.connect=zk
> > >zookeeper.connection.timeout.ms=1000000
> > >group.id=mm
> > >auto.offset.reset=smallest
> > >partition.assignment.strategy=roundrobin
> > >
> > >The producer properties
> > >
> > >bootstrap.servers=brokers
> > >client.id=mirror-producer
> > >producer.type=async
> > >compression.codec=none
> > >serializer.class=kafka.serializer.DefaultEncoder
> > >key.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
> >
> >value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer
> > >buffer.memory=268435456
> > >batch.size=1048576
> > >max.request.size=5242880
> > >send.buffer.bytes=1048576
> > >
> > >The java command to start mirror maker
> > >java -Xmx1024M -Xms512M -XX:+HeapDumpOnOutOfMemoryError
> > >-XX:HeapDumpPath=/home/kafka/slc-phx-mm-cg.hprof
> > >-XX:+PrintTenuringDistribution -XX:MaxTenuringThreshold=3 -server
> > >-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled
> > >-XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC
> > >-Djava.awt.headless=true
> > >-Xloggc:/var/log/kafka/kafka-phx/cg/mirrormaker-gc.log -verbose:gc
> > >-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > >-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
> > >-XX:GCLogFileSize=10M -Dcom.sun.management.jmxremote
> > >-Dcom.sun.management.jmxremote.authenticate=false
> > >-Dcom.sun.management.jmxremote.ssl=false
> > >-Dkafka.logs.dir=/var/log/kafka/kafka-phx/cg
> >
> >-Dlog4j.configuration=file:/usr/share/kafka/bin/../config/tools-log4j.prop
> > >erties
> > >-cp libs/* kafka.tools.MirrorMaker --consumer.config
> > >consumer.properties --num.streams 10 --producer.config
> > >producer.properties --whitelist test.*
> >
> >
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message