spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Baretta (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-5314) java.lang.OutOfMemoryError in SparkSQL with GROUP BY
Date Mon, 19 Jan 2015 05:02:34 GMT
Alex Baretta created SPARK-5314:
-----------------------------------

             Summary: java.lang.OutOfMemoryError in SparkSQL with GROUP BY
                 Key: SPARK-5314
                 URL: https://issues.apache.org/jira/browse/SPARK-5314
             Project: Spark
          Issue Type: Bug
            Reporter: Alex Baretta


I am running a SparkSQL GROUP BY query on a largish Parquet table (a few hundred million rows),
weighing it at about 50GB. My cluster has 1.7 TB of RAM, so it should have more than plenty
resources to cope with this query.

WARN TaskSetManager: Lost task 279.0 in stage 22.0 (TID 1229, ds-model-w-21.c.eastern-gravity-771.internal):
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at scala.collection.SeqLike$class.distinct(SeqLike.scala:493)
        at scala.collection.AbstractSeq.distinct(Seq.scala:40)
        at org.apache.spark.sql.catalyst.expressions.Coalesce.resolved$lzycompute(nullFunctions.scala:33)
        at org.apache.spark.sql.catalyst.expressions.Coalesce.resolved(nullFunctions.scala:33)
        at org.apache.spark.sql.catalyst.expressions.Coalesce.dataType(nullFunctions.scala:37)
        at org.apache.spark.sql.catalyst.expressions.Expression.n2(Expression.scala:100)
        at org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:101)
        at org.apache.spark.sql.catalyst.expressions.Coalesce.eval(nullFunctions.scala:50)
        at org.apache.spark.sql.catalyst.expressions.MutableLiteral.update(literals.scala:81)
        at org.apache.spark.sql.catalyst.expressions.SumFunction.update(aggregates.scala:571)
        at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:167)
        at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:151)
        at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:615)
        at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:615)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:264)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:231)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:264)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:231)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message