spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaoye Sun <sunxiaoy...@gmail.com>
Subject Re: about broadcast join of base table in spark sql
Date Sun, 02 Jul 2017 05:55:36 GMT
you may need to check if spark can get the size of your table. If spark
cannot get the table size, it won't do broadcast.

On Sat, Jul 1, 2017 at 11:37 PM Paley Louie <paley2009@gmail.com> wrote:

> Thank you for your reply, I have tried to add broadcast hint to the base
> table, but it just cannot be broadcast out.
>
> On Jun 30, 2017, at 9:13 PM, Yong Zhang <java8964@hotmail.com> wrote:
>
> Or since you already use the DataFrame API, instead of SQL, you can add
> the broadcast function to force it.
>
>
> https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)
>
> Yong
> functions - Apache Spark
> <https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/functions.html#broadcast(org.apache.spark.sql.DataFrame)>
> spark.apache.org
> Computes the numeric value of the first character of the string column,
> and returns the result as a int column.
>
>
>
>
> ------------------------------
> *From:* Bryan Jeffrey <bryan.jeffrey@gmail.com>
> *Sent:* Friday, June 30, 2017 6:57 AM
> *To:* dev@spark.org; user@spark.apache.org; paleyl
> *Subject:* Re: about broadcast join of base table in spark sql
>
> Hello.
>
> If you want to allow broadcast join with larger broadcasts you can set
> spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the
> plan to allow join despite 'A' being larger than the default threshold.
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
>
>
> From: paleyl
> Sent: Wednesday, June 28, 10:42 PM
> Subject: about broadcast join of base table in spark sql
> To: dev@spark.org, user@spark.apache.org
>
>
> Hi All,
>
>
> Recently I meet a problem in broadcast join: I want to left join table A
> and B, A is the smaller one and the left table, so I wrote
>
> A = A.join(B,A("key1") === B("key2"),"left")
>
> but I found that A is not broadcast out, as the shuffle size is still very
> large.
>
> I guess this is a designed mechanism in spark, so could anyone please tell
> me why it is designed like this? I am just very curious.
>
>
> Best,
>
>
> Paley
>
>
>

Mime
View raw message