spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Zadeh <>
Subject Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded
Date Mon, 02 Mar 2015 03:00:47 GMT
Hi Sabarish,

Works fine for me with less than those settings (30x1000 dense matrix, 1GB
driver, 1GB executor):

bin/spark-shell --driver-memory 1G --executor-memory 1G

Then running the following finished without trouble and in a few seconds.
Are you sure your driver is actually getting the RAM you think you gave it?

// Create 30x1000 matrix
val rows = sc.parallelize(1 to 30, 4).map { line =>
  val values = Array.tabulate(1000)(x=>scala.math.random)
val mat = new RowMatrix(rows)

// Compute similar columns perfectly, with brute force.
val exact = mat.columnSimilarities() => x.value).sum()

On Sun, Mar 1, 2015 at 3:31 PM, Sabarish Sasidharan <> wrote:

> I am trying to compute column similarities on a 30x1000 RowMatrix of
> DenseVectors. The size of the input RDD is 3.1MB and its all in one
> partition. I am running on a single node of 15G and giving the driver 1G
> and the executor 9G. This is on a single node hadoop. In the first attempt
> the BlockManager doesn't respond within the heart beat interval. In the
> second attempt I am seeing a GC overhead limit exceeded error. And it is
> almost always in the RowMatrix.columSimilaritiesDIMSUM ->
> mapPartitionsWithIndex (line 570)
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at
> org.apache.spark.mllib.linalg.distributed.RowMatrix$$anonfun$19$$anonfun$apply$2.apply(RowMatrix.scala:570)
>         at
> org.apache.spark.mllib.linalg.distributed.RowMatrix$$anonfun$19$$anonfun$apply$2.apply(RowMatrix.scala:528)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> It also really seems to be running out of memory. I am seeing the
> following in the attempt log
> Heap
>  PSYoungGen      total 2752512K, used 2359296K
>   eden space 2359296K, 100% used
>   from space 393216K, 0% used
>   to   space 393216K, 0% used
>  ParOldGen       total 6291456K, used 6291376K [0x0000000580000000,
> 0x0000000700000000, 0x0000000700000000)
>   object space 6291456K, 99% used
>  Metaspace       used 39225K, capacity 39558K, committed 39904K, reserved
> 1083392K
>   class space    used 5736K, capacity 5794K, committed 5888K, reserved
> 1048576K​
> ​What could be going wrong?
> Regards
> Sab

View raw message