spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaonary Rabarisoa <jaon...@gmail.com>
Subject java.lang.OutOfMemoryError: Java heap space when running job via spark-submit
Date Thu, 09 Oct 2014 16:00:16 GMT
Dear all,

I have a spark job with the following configuration

*val conf = new SparkConf()*
*     .setAppName("My Job")*
*     .set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer")*
*     .set("spark.kryo.registrator", "value.serializer.Registrator")*
*     .setMaster("local[4]")*
*     .set("spark.executor.memory", "4g")*


that I can run manually with sbt run without any problem.

But, I try to run the same job with spark-submit

*./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \*
*     --class value.jobs.MyJob \*
*     --master local[4] \*
*     --conf spark.executor.memory=4g \*
*     --conf spark.driver.memory=2g \*
*     target/scala-2.10/my-job_2.10-1.0.jar*


I get the following error :

*Exception in thread "stdin writer for List(patch_matching_similarity)"
java.lang.OutOfMemoryError: Java heap space*
* at java.util.Arrays.copyOf(Arrays.java:2271)*
* at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)*
* at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)*
* at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)*
* at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)*
* at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)*
* at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)*
* at com.esotericsoftware.krput.writeString_slow(Output.java:420)*
* at com.esotericsoftware.kryo.io.Output.writeString(Output.java:326)*
* at
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153)*
* at
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)*
* at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)*
* at
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:570)*
* at
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)*
* at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)*
* at
org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:119)*
* at
org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:110)*
* at
org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1047)*
* at
org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1056)*
* at org.apache.spark.storage.MemoryStore.putArray(MemoryStore.scala:93)*
* at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:745)*
* at org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)*
* at
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)*
* at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)*
* at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)*
* at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)*
* at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)*
* at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)*
* at
org.apache.spark.rdd.CartesianRDD$$anonfun$compute$1.apply(CartesianRDD.scala:75)*
* at
org.apache.spark.rdd.CartesianRDD$$anonfun$compute$1.apply(CartesianRDD.scala:74)*
* at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)*
yo.io.Output.require(Output.java:135)
at com.esotericsoftware.kryo.io.*Out*


I don't understand why since I set the same amount of memory in the two
cases.

Any ideas will be helpfull. I use spark 1.1.0.

Cheers,

Jao

Mime
View raw message