spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <>
Subject [jira] [Commented] (SPARK-3466) Limit size of results that a driver collects for each action
Date Tue, 21 Oct 2014 06:31:33 GMT


Matei Zaharia commented on SPARK-3466:

Ah, I see, that concern makes sense if the total size of the results is large, even though
each result might be small.

For large task results, the driver should be fetching them from executors using the block
manager. This means that once they get stored into the block manager on each worker, the driver
can choose whether it wants to fetch them all at once or not. I'd go for a solution like the
- Each task only adds a result to their local block store if it's smaller than the limit (otherwise
it can throw an error right there).
- The result fetcher in the driver is updated to track total size; this might be trickier,
since I believe it can currently fetch stuff concurrently.

> Limit size of results that a driver collects for each action
> ------------------------------------------------------------
>                 Key: SPARK-3466
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Assignee: Matthew Cheah
> Right now, operations like {{collect()}} and {{take()}} can crash the driver with an
OOM if they bring back too many data. We should add a {{spark.driver.maxResultSize}} setting
(or something like that) that will make the driver abort a job if its result is too big. We
can set it to some fraction of the driver's memory by default, or to something like 100 MB.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message