spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paleyl <paley2...@gmail.com>
Subject Re: about broadcast join of base table in spark sql
Date Sun, 02 Jul 2017 07:06:33 GMT
Thank you for your reply, but when I remove the left join option(like A =
A.join(B,A("key1") === B("key2"))), it can be broadcast out. there is no
reason spark cannot get table size when left join option is chosen on.


On Sun, Jul 2, 2017 at 1:55 PM, Xiaoye Sun <sunxiaoye07@gmail.com> wrote:

> you may need to check if spark can get the size of your table. If spark
> cannot get the table size, it won't do broadcast.
>
> On Sat, Jul 1, 2017 at 11:37 PM Paley Louie <paley2009@gmail.com> wrote:
>
>> Thank you for your reply, I have tried to add broadcast hint to the base
>> table, but it just cannot be broadcast out.
>>
>> On Jun 30, 2017, at 9:13 PM, Yong Zhang <java8964@hotmail.com> wrote:
>>
>> Or since you already use the DataFrame API, instead of SQL, you can add
>> the broadcast function to force it.
>>
>> https://spark.apache.org/docs/1.6.2/api/java/org/apache/
>> spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)
>>
>> Yong
>> functions - Apache Spark
>> <https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)>
>> spark.apache.org
>> Computes the numeric value of the first character of the string column,
>> and returns the result as a int column.
>>
>>
>>
>>
>> ------------------------------
>> *From:* Bryan Jeffrey <bryan.jeffrey@gmail.com>
>> *Sent:* Friday, June 30, 2017 6:57 AM
>> *To:* dev@spark.org; user@spark.apache.org; paleyl
>> *Subject:* Re: about broadcast join of base table in spark sql
>>
>> Hello.
>>
>> If you want to allow broadcast join with larger broadcasts you can set
>> spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause
>> the plan to allow join despite 'A' being larger than the default threshold.
>>
>>
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
>>
>>
>> From: paleyl
>> Sent: Wednesday, June 28, 10:42 PM
>> Subject: about broadcast join of base table in spark sql
>> To: dev@spark.org, user@spark.apache.org
>>
>>
>> Hi All,
>>
>>
>> Recently I meet a problem in broadcast join: I want to left join table A
>> and B, A is the smaller one and the left table, so I wrote
>>
>> A = A.join(B,A("key1") === B("key2"),"left")
>>
>> but I found that A is not broadcast out, as the shuffle size is still
>> very large.
>>
>> I guess this is a designed mechanism in spark, so could anyone please
>> tell me why it is designed like this? I am just very curious.
>>
>>
>> Best,
>>
>>
>> Paley
>>
>>
>>


-- 
Peili Lv

Department of Pattern Recognition and Intelligent System
School of Automation
Northwestern Polytechnical University
NPU Chang'an Campus
Building of Automation #130
http://paley.mydiscussion.net/

Mime
View raw message