spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang-Chi Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17556) Executor side broadcast for broadcast joins
Date Fri, 23 Sep 2016 14:52:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516668#comment-15516668
] 

Liang-Chi Hsieh commented on SPARK-17556:
-----------------------------------------

No. It doesn't.

I think the point is not only the overhead to the driver, but also the extra latency mentioned
in the jira description.

With the solution in my PR, all executors are going to fetch RDD content from other executors.
It doesn't do "collect the data first and then broadcast it".



> Executor side broadcast for broadcast joins
> -------------------------------------------
>
>                 Key: SPARK-17556
>                 URL: https://issues.apache.org/jira/browse/SPARK-17556
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>            Reporter: Reynold Xin
>         Attachments: executor broadcast.pdf
>
>
> Currently in Spark SQL, in order to perform a broadcast join, the driver must collect
the result of an RDD and then broadcast it. This introduces some extra latency. It might be
possible to broadcast directly from executors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message