spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sabarish Sasidharan <>
Subject Column Similarities using DIMSUM fails with GC overhead limit exceeded
Date Sun, 01 Mar 2015 23:31:45 GMT
I am trying to compute column similarities on a 30x1000 RowMatrix of
DenseVectors. The size of the input RDD is 3.1MB and its all in one
partition. I am running on a single node of 15G and giving the driver 1G
and the executor 9G. This is on a single node hadoop. In the first attempt
the BlockManager doesn't respond within the heart beat interval. In the
second attempt I am seeing a GC overhead limit exceeded error. And it is
almost always in the RowMatrix.columSimilaritiesDIMSUM ->
mapPartitionsWithIndex (line 570)

java.lang.OutOfMemoryError: GC overhead limit exceeded
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)

It also really seems to be running out of memory. I am seeing the following
in the attempt log
 PSYoungGen      total 2752512K, used 2359296K
  eden space 2359296K, 100% used
  from space 393216K, 0% used
  to   space 393216K, 0% used
 ParOldGen       total 6291456K, used 6291376K [0x0000000580000000,
0x0000000700000000, 0x0000000700000000)
  object space 6291456K, 99% used
 Metaspace       used 39225K, capacity 39558K, committed 39904K, reserved
  class space    used 5736K, capacity 5794K, committed 5888K, reserved

​What could be going wrong?


View raw message