spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Natarajan (JIRA)" <>
Subject [jira] [Commented] (SPARK-1529) Support DFS based shuffle in addition to Netty shuffle
Date Mon, 09 Oct 2017 18:29:00 GMT


Karthik Natarajan commented on SPARK-1529:

Hello [~rkannan82]

Are there any updates for this feature? I was looking for something similar as well. Do you
happen to have any comparisons between using hdfs to read / write shuffle data vs using local
disk + netty ?


> Support DFS based shuffle in addition to Netty shuffle
> ------------------------------------------------------
>                 Key: SPARK-1529
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Kannan Rajah
>         Attachments: Spark Shuffle using HDFS.pdf
> In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem
interface. Shuffle is implemented by writing intermediate data to local disk and serving it
to remote node using Netty as a transport mechanism. We want to provide an HDFS based shuffle
such that data can be written to HDFS (instead of local disk) and served using HDFS API on
the remote nodes. This could involve exposing a file system abstraction to Spark shuffle and
have 2 modes of running it. In default mode, it will write to local disk and in the DFS mode,
it will write to HDFS.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message