spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Davidson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks
Date Wed, 14 May 2014 16:59:14 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997761#comment-13997761
] 

Aaron Davidson commented on SPARK-1767:
---------------------------------------

One simple workaround to this is to just make sure that partitions that are in memory are
ordered first in the list of partitions, as Spark will try to place executors based on the
order in this list. This is, of course, not a complete solution, as we would not utilize the
locality-wait logic within Spark and would immediately fallback to a non-cached node if the
cached node was busy, rather than waiting for some period of time for the cached node to become
available.

> Prefer HDFS-cached replicas when scheduling data-local tasks
> ------------------------------------------------------------
>
>                 Key: SPARK-1767
>                 URL: https://issues.apache.org/jira/browse/SPARK-1767
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Sandy Ryza
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message