spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Haberman <stephen.haber...@gmail.com>
Subject oome from blockmanager
Date Sat, 26 Oct 2013 18:43:53 GMT
Hi,

By dropping spark.shuffle.file.buffer.kb to 10k and using Snappy
(thanks, Aaron), the job I'm trying to run is no longer OOMEing because
of 300k LZF buffers taking up 4g of RAM.

But...now it's OOMEing because BlockManager is taking ~3.5gb of RAM
(which is ~90% of the available heap).

Specifically, it's two ConcurrentHashMaps:

* BlockManager.blockInfo has ~1gb retained, AFAICT from ~5.5 million
  entries of (ShuffleBlockId, (BlockInfo, Long))

* BlockManager's DiskBlockManager.blockToFileSegmentMap has ~2.3gb
  retained, AFAICT from about the same ~5.5 million entries of
  (ShuffleBlockId, (FileSegment, Long)).

The job stalls about 3,000 tasks through a 7,000-partition shuffle that
is loading ~500gb from S3 on 5 m1.large (4gb heap) machines. The job
did a few smaller ~50-partition shuffles before this larger one, but
nothing crazy. It's an on-demand/EMR cluster, in standalone mode. 

Both of these maps are TimeStampedHashMaps, which kind of makes me
shudder, but we have the cleaner disabled which AFAIK is what we want,
because we aren't running long-running streaming jobs. And AFAIU if the
hash map did get cleaned up mid-shuffle, lookups would just start
failing (which was actually happening for this job on Spark 0.7 and is
what prompted us to get around to trying Spark 0.8).

So, I haven't really figured out BlockManager yet--any hints on what we
could do here? More machines? Should there really be this many entries
in it for a shuffle of this size?

I know 5 machines/4gb of RAM isn't a lot, and I could use more if
needed, but I just expected the job to go slower, not OOME.

Also, I technically have a heap dump from a m1.xlarge (~15gb of RAM)
cluster that also OOMEd on the same job, but I can't open the file on
my laptop, so I can't tell if it was OOMEing for this issue or another
one (it was not using snappy, but using 10kb file buffers, so I'm
interested to see what happened to it.)

- Stephen


Mime
View raw message