http://spark.apache.org/docs/latest/tuning.html#level-of-parallelism



On Fri, Aug 1, 2014 at 1:29 PM, Haiyang Fu <haiyangfu512@gmail.com> wrote:
Hi, 
here are two tips for you,
1. increase the parallism level 
2.increase the driver memory


On Fri, Aug 1, 2014 at 12:58 AM, Sameer Tilak <sstilak@live.com> wrote:
Hi everyone,
I have the following configuration. I am currently running my app in local mode.

  val conf = new SparkConf().setMaster("local[2]").setAppName("ApproxStrMatch").set("spark.executor.memory", "3g").set("spark.storage.memoryFraction", "0.1")

I am getting the following error. I tried setting up spark.executor.memory and memory fraction setting, however my UI does not show the increase and I still get these errors. I am loading a TSV file from HDFS (around 5 GB). Does this mean, I should update these settings and add more memory or is it somethign else? Spark master has 24 GB physical memory and workers have 16 GB, but we are running other services (CDH 5.1) on these nodes as well. 

14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 6 ms
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 6 ms
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 1 ms
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 1 ms
14/07/31 09:48:17 ERROR Executor: Exception in task ID 5
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/07/31 09:48:17 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-3,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/07/31 09:48:17 WARN TaskSetManager: Lost TID 5 (task 1.0:0)
14/07/31 09:48:17 WARN TaskSetManager: Loss was due to java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/07/31 09:48:17 ERROR TaskSetManager: Task 1.0:0 failed 1 times; aborting job
14/07/31 09:48:17 INFO TaskSchedulerImpl: Cancelling stage 1
14/07/31 09:48:17 INFO DAGScheduler: Failed to run collect at ComputeScores.scala:76
14/07/31 09:48:17 INFO Executor: Executor is trying to kill task 6
14/07/31 09:48:17 INFO TaskSchedulerImpl: Stage 1 was cancelled