spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-15317) JobProgressListener takes a huge amount of memory with iterative DataFrame program in local mode
Date Fri, 13 May 2016 21:24:12 GMT
Joseph K. Bradley created SPARK-15317:
-----------------------------------------

             Summary: JobProgressListener takes a huge amount of memory with iterative DataFrame
program in local mode
                 Key: SPARK-15317
                 URL: https://issues.apache.org/jira/browse/SPARK-15317
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.0
         Environment: Spark 2.0, local mode + standalone mode on MacBook Pro OSX 10.9
            Reporter: Joseph K. Bradley


Running a small test locally, I found JobProgressListener consuming a huge amount of memory.
 There are many tasks being run, but it is still surprising.  Summary, with details below:
* Spark app: series of DataFrame joins
* Issue: GC
* Heap dump shows JobProgressListener taking 150 - 400MB, depending on the Spark mode/version

The code which fails:
* Here is a branch with the code snippet which fails: [https://github.com/jkbradley/spark/tree/18836174ab190d94800cc247f5519f3148822dce]
** This is based on Spark commit hash: bb1362eb3b36b553dca246b95f59ba7fd8adcc8a
* Look at {{CC.scala}}, which implements connected components using DataFrames: [https://github.com/jkbradley/spark/blob/18836174ab190d94800cc247f5519f3148822dce/mllib/src/main/scala/org/apache/spark/ml/CC.scala]

In the spark shell, run:
{code}
import org.apache.spark.ml.CC
import org.apache.spark.sql.SQLContext
val sqlContext = SQLContext.getOrCreate(sc)
CC.runTest(sqlContext)
{code}

I have attached a file {{cc_traces.txt}} with the stack traces from running {{runTest}}. 
Note that I sometimes had to run {{runTest}} twice to cause the fatal exception.  This includes
a trace for 1.6, which should run without modifications to {{CC.scala}}.

I used {{jmap}} to dump the heap for both 1.6 and 2.0.  I have attached screenshots with summaries
of those dumps in MemoryAnalyzer:
* TODO

Both 1.6 and 2.0 exhibit this issue.  2.0 ran faster, and the issue (JobProgressListener allocation)
seems more severe with 2.0, though it could just be that 2.0 makes more progress and runs
more jobs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message