spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 任弘迪 <ryan.hd....@gmail.com>
Subject Re: spark sql: full outer join optimization
Date Fri, 03 Mar 2017 07:53:40 GMT
Thanks yong.

On Wed, Feb 22, 2017 at 9:45 PM, Yong Zhang <java8964@hotmail.com> wrote:

> In Spark SQL, the broadcast join is triggered by "spark.sql.conf.
> autoBroadcastJoinThreshold", check what is your spark session.
>
>
> The more important point is that Spark normally uses the statistics of the
> table, and if the table size is less than above setting, then broadcast
> join will be considered. So make sure you update your statistics in your
> table.
>
>
> Another limitation is for sub query. If you use sub query, it is very hard
> for Catalyst to understand what is the estimate size of your sub query.
> Spark 2.0 just start to implement some CBO logic, but it is still in very
> early stage.
>
>
> If your case, if you are confident that Broadcast join is the way to go,
> you can use the Broadcast function call in DF, to force the broadcast join.
>
>
> Yong
>
> ------------------------------
> *From:* Hongdi Ren <ryan.hd.ren@gmail.com>
> *Sent:* Wednesday, February 22, 2017 1:32 AM
> *To:* user@spark.apache.org
> *Subject:* spark sql: full outer join optimization
>
>
> Hi all,
>
>
>
> On spark 1.6, I’ve written a full outer join on no key which causes a
> Cartesian product. The left side is about 10 thousand rows while the right
> side is about 80 million.
>
>
>
> My question is, why not spark sql optimize it to a mapper join (broadcast
> the small table) automatically? Is there any parameter/hint I can set
> rather than manually write broadcast code?
>
>
>
> Sql example:
>
>   select * from (select id from a) user full outer join (select id from b)
> item
>
>   (as the name implies, I’m doing a collaborative filtering with implicit
> feedback, thus the whole user-item matrix is needed)
>
>
>
> Thanks for any help.
>
>
>
>
>
> Attach the query plan:
>
>
>
>

Mime
View raw message