spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Maximum memory limits
Date Sun, 16 Mar 2014 19:24:37 GMT
You should simply use a snapshot built from HEAD of
github.com/apache/sparkif you can. The key change is in MLlib and with
any luck you can just
replace that bit. See the PR I referenced.

Sure with enough memory you can get it to run even with the memory issue,
but it could be hundreds of GB at your scale. Not sure I take the point
about the JVM; you can give it 64GB of heap and executors can use that
much, sure.

You could reduce the number of features a lot to work around it too, or
reduce the input size. (If anyone saw my blog post about StackOverflow and
ALS -- that's why I snuck in a relatively paltry 40 features and pruned
questions with <4 tags :) )

I don't think jblas has anything to do with it per se, and the allocation
fails in Java code, not native code. This should be exactly what that PR I
mentioned fixes.

--
Sean Owen | Director, Data Science | London


On Sun, Mar 16, 2014 at 11:48 AM, Debasish Das <debasish.das83@gmail.com>wrote:

> Thanks Sean...let me get the latest code..do you know which PR was it ?
>
> But will the executors run fine with say 32 gb or 64 gb of memory ? Does
> not JVM shows up issues when the max memory goes beyond certain limit...
>
> Also the failure is due to GC limits from jblas...and I was thinking that
> jblas is going to call native malloc right ? May be 64 gb is not a big deal
> then...I will try increasing to 32 and then 64...
>
> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit
> exceeded)
> org.jblas.DoubleMatrix.<init>(DoubleMatrix.java:323)
> org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:471)
> org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:476)
> com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)
> com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)
> scala.Array$.fill(Array.scala:267)
> com.verizon.bigdata.mllib.recommendation.ALSQR.updateBlock(ALSQR.scala:366)
> com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:346)
> com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:345)
> org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)
> org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:149)
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:147)
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:147)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:233)
> org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:32)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:233)
> org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:32)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:233)
> org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:233)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
> org.apache.spark.scheduler.Task.run(Task.scala:53)
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>
>
>
> On Sun, Mar 16, 2014 at 11:42 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> Are you using HEAD or 0.9.0? I know there was a memory issue fixed a few
>> weeks ago that made ALS need a lot more memory than is needed.
>>
>> https://github.com/apache/incubator-spark/pull/629
>>
>> Try the latest code.
>>
>> --
>> Sean Owen | Director, Data Science | London
>>
>>
>> On Sun, Mar 16, 2014 at 11:40 AM, Debasish Das <debasish.das83@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I gave my spark job 16 gb of memory and it is running on 8 executors.
>>>
>>> The job needs more memory due to ALS requirements (20M x 1M matrix)
>>>
>>> On each node I do have 96 gb of memory and I am using 16 gb out of it. I
>>> want to increase the memory but I am not sure what is the right way to do
>>> that...
>>>
>>> On 8 executor if I give 96 gb it might be a issue due to GC...
>>>
>>> Ideally on 8 nodes, I would run with 48 executors and each executor will
>>> get 16 gb of memory..Total  48 JVMs...
>>>
>>> Is it possible to increase executors per node ?
>>>
>>> Thanks.
>>> Deb
>>>
>>
>>
>

Mime
View raw message