spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pat McDonough (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1392) Local spark-shell Runs Out of Memory With Default Settings
Date Wed, 02 Apr 2014 06:01:22 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957379#comment-13957379
] 

Pat McDonough commented on SPARK-1392:
--------------------------------------

Running the following with the attached data results in the errors below:
{code}
scala> val explore = sc.textFile("/Users/pat/Projects/training-materials/Data/wiki_links")
...
scala> explore.cache
res1: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12
...
scala> explore.count
...
14/04/01 22:52:48 INFO HadoopRDD: Input split: file:/Users/pat/Projects/training-materials/Data/wiki_links/part-00007:0+25009430
14/04/01 22:52:54 INFO MemoryStore: ensureFreeSpace(55520836) called with curMem=271402430,
maxMem=309225062
14/04/01 22:52:54 INFO MemoryStore: Will not store rdd_1_7 as it would require dropping another
block from the same RDD
14/04/01 22:52:54 INFO BlockManager: Dropping block rdd_1_7 from memory
14/04/01 22:52:54 WARN BlockManager: Block rdd_1_7 could not be dropped from memory as it
does not exist
14/04/01 22:52:54 INFO BlockManagerMaster: Updated info of block rdd_1_7
14/04/01 22:52:54 INFO BlockManagerMaster: Updated info of block rdd_1_7
14/04/01 22:52:54 INFO Executor: Serialized size of result for 7 is 563
14/04/01 22:52:54 INFO Executor: Sending result for 7 directly to driver
14/04/01 22:52:54 INFO Executor: Finished task ID 7
14/04/01 22:52:54 INFO TaskSetManager: Starting task 0.0:8 as TID 8 on executor localhost:
localhost (PROCESS_LOCAL)
14/04/01 22:52:54 INFO TaskSetManager: Serialized task 0.0:8 as 1606 bytes in 2 ms
14/04/01 22:52:54 INFO Executor: Running task ID 8
14/04/01 22:52:54 INFO TaskSetManager: Finished TID 7 in 6714 ms on localhost (progress: 7/10)
14/04/01 22:52:54 INFO DAGScheduler: Completed ResultTask(0, 7)
14/04/01 22:52:54 INFO BlockManager: Found block broadcast_0 locally
14/04/01 22:52:54 INFO CacheManager: Partition rdd_1_8 not found, computing it
14/04/01 22:52:54 INFO HadoopRDD: Input split: file:/Users/pat/Projects/training-materials/Data/wiki_links/part-00008:0+25904930
14/04/01 22:52:59 INFO TaskSetManager: Starting task 0.0:9 as TID 9 on executor localhost:
localhost (PROCESS_LOCAL)
14/04/01 22:52:59 ERROR Executor: Exception in task ID 8
{code}


{noformat}
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
	at java.nio.CharBuffer.allocate(CharBuffer.java:331)
	at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
	at org.apache.hadoop.io.Text.decode(Text.java:405)
	at org.apache.hadoop.io.Text.decode(Text.java:382)
	at org.apache.hadoop.io.Text.toString(Text.java:280)
	at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:344)
	at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:344)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:75)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
	at org.apache.spark.scheduler.Task.run(Task.scala:53)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
{noformat}


> Local spark-shell Runs Out of Memory With Default Settings
> ----------------------------------------------------------
>
>                 Key: SPARK-1392
>                 URL: https://issues.apache.org/jira/browse/SPARK-1392
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0
>         Environment: OS X 10.9.2, Java 1.7.0_51, Scala 2.10.3
>            Reporter: Pat McDonough
>
> Using the spark-0.9.0 Hadoop2 binary from the project download page, running the spark-shell
locally in out of the box configuration, and attempting to cache all the attached data, spark
OOMs with: java.lang.OutOfMemoryError: GC overhead limit exceeded
> You can work around the issue by either decreasing spark.storage.memoryFraction or increasing
SPARK_MEM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message