mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space
Date Fri, 12 Feb 2016 21:15:10 GMT
You have to set the executor memory. BTW you have given the driver all memory on the machine.

> On Feb 10, 2016, at 9:30 AM, Jaume Galí <jgali@konodrac.com> wrote:
> 
> Hi again,
> (Sorry for my delay but we didn’t have machine to test your thoughts about memory issue.)
> 
> The problem still happening testing with an input matrix of 100k rows by 300 items, I
increase memory as you suggest but nothing changed. I attached spark_env.sh and new specs
of machine
> 
> Machine specs:
> 
> m3.xlarge AWS (Ivy Bridge, 15Gb ram, 2x40gb HD)
> 
> This is my spark-env.sh:
> 
>          #!/usr/bin/env bash
> # Licensed to ...
> ​
> export SPARK_HOME=${SPARK_HOME:-/usr/lib/spark}
> export SPARK_LOG_DIR=${SPARK_LOG_DIR:-/var/log/spark}
> export HADOOP_HOME=${HADOOP_HOME:-/usr/lib/hadoop}
> export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
> export HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf}
> ​
> export STANDALONE_SPARK_MASTER_HOST=ip-10-12-17-235.eu <http://ip-10-12-17-235.eu/>-west-1.compute.internal
> export SPARK_MASTER_PORT=7077
> export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
> export SPARK_MASTER_WEBUI_PORT=8080
> ​
> export SPARK_WORKER_DIR=${SPARK_WORKER_DIR:-/var/run/spark/work}
> export SPARK_WORKER_PORT=7078
> export SPARK_WORKER_WEBUI_PORT=8081
> ​
> export HIVE_SERVER2_THRIFT_BIND_HOST=0.0.0.0
> export HIVE_SERVER2_THRIFT_PORT=10001
> ​
> ​export SPARK_DRIVER_MEMORY=15G
> export SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -XX:OnOutOfMemoryError='kill -9
%p’”
> 
> Log:
> 
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 12.0 failed 1 times, most recent failure: Lost task 0.0 in stage
12.0 (TID 24, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded
> …….
> …..
> ..
> .
> 
> Driver stacktrace:
> Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> …….
> …..
> ...
> ..
> .
> 
> 
> Thanks for advance
> 
>> El 2/2/2016, a las 7:48, Pat Ferrel <pat@occamsmachete.com <mailto:pat@occamsmachete.com>>
escribió:
>> 
>> You probably need to increase your driver memory and 8g will not work. 16g is probably
the smallest stand alone machine that will work since the driver and executors run on it.
>> 
>>> On Feb 1, 2016, at 1:24 AM, jgali@konodrac.com <mailto:jgali@konodrac.com>
wrote:
>>> 
>>> Hello everybody,
>>> 
>>> We are experimenting problems when we use "mahout spark-rowsimilarity” operation.
We have an input matrix with 100k rows and 100 items and process throws an exception about
“Exception in task 0.0 in stage 13.0 (TID 13) java.lang.OutOfMemoryError: Java heap space”
and we try to increase JAVA HEAP MEMORY, MAHOUT HEAP MEMORY and spark.driver.memory. 
>>> 
>>> Environment versions:
>>> Mahout: 0.11.1
>>> Spark: 1.6.0.
>>> 
>>> Mahout command line:
>>> 	/opt/mahout/bin/mahout spark-rowsimilarity -i 50k_rows__50items.dat -o test_output.tmp
--maxObservations 500 --maxSimilaritiesPerRow 100 --omitStrength --master local --sparkExecutorMem
8g
>>> 
>>> This process is running on a machine with following specifications:
>>> Mem RAM: 8gb 
>>> CPU with 8 cores
>>> 	
>>> .profile file:
>>> export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
>>> export HADOOP_HOME=/opt/hadoop-2.6.0
>>> export SPARK_HOME=/opt/spark
>>> export MAHOUT_HOME=/opt/mahout
>>> export MAHOUT_HEAPSIZE=8192
>>> 
>>> Throws exception:
>>> 	
>>> 16/01/22 11:45:06 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13)
>>> java.lang.OutOfMemoryError: Java heap space
>>>      at org.apache.mahout.math.DenseMatrix.<init>(DenseMatrix.java:66)
>>>      at org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:70)
>>>      at org.apache.mahout.sparkbindings.drm.package$$anonfun$blockify$1.apply(package.scala:59)
>>>      at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>>      at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>>      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>      at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>>      at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>>      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>      at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>>      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>      at java.lang.Thread.run(Thread.java:745)
>>> 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message =
Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, localhost, 42107))] in 1
attempts
>>> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds].
This timeout is controlled by spark.rpc.askTimeout
>>>      at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>>>      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
>>>      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>>>      at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
>>>      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
>>>      at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>>>      at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
>>>      at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448)
>>>      at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468)
>>>      at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
>>>      at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
>>>      at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
>>>      at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468)
>>>      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>      at java.lang.Thread.run(Thread.java:745)
>>> 16/01/22 11:45:06 WARN NettyRpcEndpointRef: Error sending message [message =
Heartbeat(driver,[Lscala.Tuple2;@12498227,BlockManagerId(driver, localhost, 42107))] in 1
attempts
>>> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds].
This timeout is controlled by spark.rpc.askTimeout
>>>      at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
>>>      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
>>>      at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>>>      at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
>>>      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
>>>      at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>>>      at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
>>>      at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:448)
>>>      at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468)
>>>      at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
>>>      at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:468)
>>>      at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741)
>>>      at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:468)
>>>      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>>      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>>      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>      at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120
seconds]
>>>      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>      at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>      at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>      at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>>      at scala.concurrent.Await$.result(package.scala:107)
>>>      at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>>>      ...
>>> 
>>> Can you please advise?
>>> 
>>> 
>>> Thanks for advance.
>>> Cheers.
>> 
> 


Mime
View raw message