spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Lawler (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-4808) Spark fails to spill with small number of large objects
Date Wed, 10 Dec 2014 01:51:12 GMT
Dennis Lawler created SPARK-4808:
------------------------------------

             Summary: Spark fails to spill with small number of large objects
                 Key: SPARK-4808
                 URL: https://issues.apache.org/jira/browse/SPARK-4808
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.1.0, 1.0.2, 1.2.0, 1.2.1
            Reporter: Dennis Lawler


Spillable's maybeSpill does not allow spill to occur until at least 1000 elements have been
spilled, and then will only evaluate spill every 32nd element thereafter.  When there is a
small number of very large items being tracked, out-of-memory conditions may occur.

I suspect that this and the every-32nd-element behavior was to reduce the impact of the estimateSize()
call.  This method was extracted into SizeTracker, which implements its own exponential backup
for size estimation, so now we are only avoiding using the resulting estimated size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message