spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Spark SQL Optimization
Date Mon, 21 Mar 2016 17:36:05 GMT
It's helpful if you can include the output of EXPLAIN EXTENDED or
df.explain(true) whenever asking about query performance.

On Mon, Mar 21, 2016 at 6:27 AM, gtinside <gtinside@gmail.com> wrote:

> Hi ,
>
> I am trying to execute a simple query with join on 3 tables. When I look at
> the execution plan , it varies with position of table in the "from" clause.
> Execution plan looks more optimized when the position of table with
> predicates is specified before any other table.
>
>
> Original query :
>
> select distinct pge.portfolio_code
> from table1 pge join table2 p
> on p.perm_group = pge.anc_port_group
> join table3 uge
> on p.user_group=uge.anc_user_group
> where uge.user_name = 'user' and p.perm_type = 'TEST'
>
> Optimized query (table with predicates is moved ahead):
>
> select distinct pge.portfolio_code
> from table1 uge, table2 p, table3 pge
> where uge.user_name = 'user' and p.perm_type = 'TEST'
> and p.perm_group = pge.anc_port_group
> and p.user_group=uge.anc_user_group
>
>
> Execution plan is more optimized for the optimized query and hence the
> query
> executes faster. All the tables are being sourced from parquet files
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Optimization-tp26548.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message