spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yichen Zhou <zhouy...@gmail.com>
Subject Shuffle write explosion
Date Mon, 05 Nov 2018 07:41:52 GMT
Hi All,

When running a spark job, I have 100MB+ input and get more than 700GB
shuffle write, which is really weird. And this job finally end up with the
OOM error. Does anybody know why this happened?
[image: Screen Shot 2018-11-05 at 15.20.35.png]
My code is like:

> JavaPairRDD<Text, Text> inputRDD = sc.sequenceFile(inputPath, Text.class,
> Text.class);

 inputRDD.repartition(partitionNum).mapToPair(...).saveAsNewAPIHadoopDataset(job.getConfiguration());


Environment:

*CPU 32 core; Memory 256G; Storage 7.5GCentOS 7.5*
*java version "1.8.0_162"*
*Spark 2.1.2*

Any help is greatly appreciated.

Regards,
Yichen

Mime
View raw message