spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yong Zhang <java8...@hotmail.com>
Subject Re: about broadcast join of base table in spark sql
Date Sun, 02 Jul 2017 13:16:02 GMT
Then you need to tell us the spark version, and post the execution plan here, so we can help
you better.


Yong


________________________________
From: Paley Louie <paley2009@gmail.com>
Sent: Sunday, July 2, 2017 12:36 AM
To: Yong Zhang
Cc: Bryan Jeffrey; dev@spark.org; user@spark.apache.org
Subject: Re: about broadcast join of base table in spark sql

Thank you for your reply, I have tried to add broadcast hint to the base table, but it just
cannot be broadcast out.
On Jun 30, 2017, at 9:13 PM, Yong Zhang <java8964@hotmail.com<mailto:java8964@hotmail.com>>
wrote:

Or since you already use the DataFrame API, instead of SQL, you can add the broadcast function
to force it.

https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)

Yong
functions - Apache Spark<https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)>
spark.apache.org<http://spark.apache.org/>
Computes the numeric value of the first character of the string column, and returns the result
as a int column.





________________________________
From: Bryan Jeffrey <bryan.jeffrey@gmail.com<mailto:bryan.jeffrey@gmail.com>>
Sent: Friday, June 30, 2017 6:57 AM
To: dev@spark.org<mailto:dev@spark.org>; user@spark.apache.org<mailto:user@spark.apache.org>;
paleyl
Subject: Re: about broadcast join of base table in spark sql

Hello.

If you want to allow broadcast join with larger broadcasts you can set spark.sql.autoBroadcastJoinThreshold
to a higher value. This will cause the plan to allow join despite 'A' being larger than the
default threshold.

Get Outlook for Android<https://aka.ms/ghei36>



From: paleyl
Sent: Wednesday, June 28, 10:42 PM
Subject: about broadcast join of base table in spark sql
To: dev@spark.org<mailto:dev@spark.org>, user@spark.apache.org<mailto:user@spark.apache.org>


Hi All,


Recently I meet a problem in broadcast join: I want to left join table A and B, A is the smaller
one and the left table, so I wrote

A = A.join(B,A("key1") === B("key2"),"left")

but I found that A is not broadcast out, as the shuffle size is still very large.

I guess this is a designed mechanism in spark, so could anyone please tell me why it is designed
like this? I am just very curious.


Best,


Paley


Mime
View raw message