spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nmoretto <>
Subject DIMSUM among 550k objects on AWS Elastic Map Reduce fails with OOM errors
Date Fri, 27 May 2016 10:38:28 GMT
Hello everyone, I am trying to compute the similarity between 550k objects
using the DIMSUM algorithm available in Spark 1.6.

The cluster runs on AWS Elastic Map Reduce and consists of 6 r3.2xlarge
instances (one master and five cores), having 8 vCPU and 61 GiB of RAM each.

My input data is a 3.5GB CSV file hosted on AWS S3, which I use to build a
RowMatrix with 550k columns and 550k rows, passing sparse vectors as rows to
the RowMatrix constructor.

At every attempt I've made so far the application fails during the
/mapPartitionWithIndex/ stage of the /RowMatrix.columnSimilarities()/ method
(source code at

) with YARN containers 1) exiting with /FAILURE/ due to an /OutOfMemory/
exception on Java heap space (thanks to Spark, apparently) or 2) terminated
by AM (and increasing /spark.yarn.executor.memoryOverhead/ as suggested
doesn't seem to work).

I tried and combined different approaches without noticing significant
- setting AWS EMR maximizeResourceAllocation option to true (details at

- increasing the number of partitions (via /spark.default.parallelism/, up
to 8000)
- increasing the driver and executor memory (respectively from default ~512M
/ ~5G to ~50G / ~15G)
- increasing YARN memory overhead (from default 10% up to 40% of driver and
executor memory, respectively)
- setting the DIMSUM threshold to 0.5 and 0.8 to reduce the number of

Anyone has any idea about the possible cause(s) of these errors? Thank you.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message