spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shuo Chen <chenatu2...@gmail.com>
Subject question about spark sql join pruning
Date Tue, 08 Oct 2019 10:58:46 GMT
Hi!

I have a question about spark left join pruning. For example, I have 2
tables:
table A:
create table A (
  user_id int,
  gender string,
  email string,
  phone string,
)

table B:
create table B (
  user_id int,
  jobs string,
  graduate_schools string
)

If I select columns of A from sub-query with A left join B. Can spark
optimize the plan to just scan table A?

Query like this:

select user_id, gender, email, phone from (
  select A.user_id as user_id,
           A.gender as gender,
           A.email as email,
           A. phone as phone
           from A left join B on A.user_id = B.user_id
)

More general question like this,
If I have a view with table A1 left join table A2 left join... An
Columns of Ai, Aj, Ak are selected. i, j k are subset of 1...n

Can spark prune the unnecessary table during the table joining?

-- 
*Shuo Chen*
chenatu2006@gmail.com

Mime
View raw message