spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhang Victor <zhangshuai_w...@outlook.com>
Subject 回复: Question About Spark Streaming Kafka Executor Memory
Date Mon, 14 Oct 2019 09:05:38 GMT
Thank you Antoni.

I found the problem! The cause of the problem is gzip compression in Hadoop and it will cause
direct memory leaks, so the heap memory looks well.

It does not occur when using plain text.


rdd.map(record => {
  Base64.getEncoder.encodeToString(record.value())
})// .saveAsTextFile(savePath, classOf[GzipCodec])
  .saveAsTextFile(savePath)


Finally, I set the hadoop config io.native.lib.available=false, but this may cause performance
degradation.

hadoop config:
    <property>
      <name>io.native.lib.available</name>
      <value>false</value>
    </property>

GzipCodec.java


public CompressionOutputStream createOutputStream(OutputStream out)
  throws IOException {
  if (!ZlibFactory.isNativeZlibLoaded(conf)) {
    return new GzipOutputStream(out);
  }
  return CompressionCodec.Util. // attention here
      createOutputStreamWithCodecPool(this, conf, out);
}

public static boolean isNativeZlibLoaded(Configuration conf) {
  return nativeZlibLoaded && conf.getBoolean(
                        CommonConfigurationKeys.IO_NATIVE_LIB_AVAILABLE_KEY,
                        CommonConfigurationKeys.IO_NATIVE_LIB_AVAILABLE_DEFAULT);
}

[cid:a041bbe3-8281-4efa-ba0f-3e09adb02397]

https://issues.apache.org/jira/browse/HADOOP-10591

https://issues.apache.org/jira/browse/HADOOP-12007


________________________________
发件人: Antoni Ivanov <aivanov@vmware.com>
发送时间: 2019年10月14日 14:43
收件人: 张 帅 <zhangshuai_work@outlook.com>; user@spark.apache.org <user@spark.apache.org>
主题: RE: Question About Spark Streaming Kafka Executor Memory


Hi,



You maybe hitting YARN limits (e.g have you checked yarn.scheduler.maximum-allocation-mb)



Or you are not setting them correctly? Below it says limit is 2.5GB (not 4GB)



Or maybe you need even more memory? You can also try to reduce batch size (by decreasing batch
duration) ?









From: 张 帅 <zhangshuai_work@outlook.com>
Sent: Saturday, October 12, 2019 9:20 AM
To: user@spark.apache.org
Subject: Question About Spark Streaming Kafka Executor Memory



Hi all,



I have a spark streaming application that consume records from kafka and save result to HDFS.



The program logic is very simple. Just map the message to base64 string.



There is 240 partitions of the source topic, and my app has 30 executors/8 cores/2 GB.



Code:



rdd.map(record => {
  Base64.getEncoder.encodeToString(record.value())
}).saveAsTextFile(savePath, classOf[GzipCodec])

I found that some executor will dead because yarn container memory limit.





[cid:7dc8606f-519e-4a98-9256-eb3ca211bde4]



19/10/12 11:47:04 WARN yarn.YarnAllocator: Container killed by YARN for exceeding memory limits.
 2.5 GB of 2.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead
or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

19/10/12 11:47:04 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver
to remove executor 14 for reason Container killed by YARN for exceeding memory limits.  2.5
GB of 2.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or
disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

19/10/12 11:47:04 ERROR cluster.YarnClusterScheduler: Lost executor 14 on xxx: Container killed
by YARN for exceeding memory limits.  2.5 GB of 2.5 GB physical memory used. Consider boosting
spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because
of YARN-4714.







[cid:5ddf6ac6-74db-41fa-86e4-63ebb2a1fb50]



Some suggestions are to increase spark.yarn.executor.memoryOverhead and I tried setting it
to 512MB, but this will still happen(executor-memory=4GB too).



Check the executor's GC and found no problems.




[cid:baa9ce1d-381d-4345-b19e-fe9cdfbc2dd2]



I want to know why this happens and how to solve it.



spark version: 2.3.1 / 2.4.4

kafka version: 2.0.0



Thanks
Mime
View raw message