spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bahubali Jain <bahub...@gmail.com>
Subject Compression during shuffle writes
Date Fri, 10 Nov 2017 03:54:36 GMT
Hi,
I have compressed data of size 500GB .I am repartitioning this data since
the underlying data is very skewed and is causing a lot of issues for the
downstream jobs.
During repartioning the *shuffles writes* are not getting compressed due to
this I am running into disk space issues.Below is the screen shot which
clearly depicts the issue(Input,shuffle write columns)
I have proactively set below parameters to true, but still it doesnt
compress the intermediate shuffled data

spark.shuffle.compress
spark.shuffle.spill.compress

[image: Inline image 1]

I am using Spark 1.5 (for various unavoidable reasons!!)
Any suggestions would be greatly appreciated.

Thanks,
Baahu

Mime
View raw message