spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilya Ganelin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-4927) Spark does not clean up properly during long jobs.
Date Tue, 23 Dec 2014 02:25:13 GMT
Ilya Ganelin created SPARK-4927:
-----------------------------------

             Summary: Spark does not clean up properly during long jobs. 
                 Key: SPARK-4927
                 URL: https://issues.apache.org/jira/browse/SPARK-4927
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.1.0
            Reporter: Ilya Ganelin


On a long running Spark job, Spark will eventually run out of memory on the driver node due
to metadata overhead from the shuffle operation. Spark will continue to operate, however with
drastically decreased performance (since swapping now occurs with every operation).

The spark.cleanup.tll parameter allows a user to configure when cleanup happens but the issue
with doing this is that it isn’t done safely, e.g. If this clears a cached RDD or active
task in the middle of processing a stage, this ultimately causes a KeyNotFoundException when
the next stage attempts to reference the cleared RDD or task.

There should be a sustainable mechanism for cleaning up stale metadata that allows the program
to continue running. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message