spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-27280) infer filters from Join's OR condition
Date Mon, 16 Mar 2020 22:54:06 GMT

     [ https://issues.apache.org/jira/browse/SPARK-27280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dongjoon Hyun updated SPARK-27280:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> infer filters from Join's OR condition
> --------------------------------------
>
>                 Key: SPARK-27280
>                 URL: https://issues.apache.org/jira/browse/SPARK-27280
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer, SQL
>    Affects Versions: 3.1.0
>            Reporter: Song Jun
>            Priority: Major
>
> In some case, We can infer filters from Join condition with OR expressions.
> for example, tpc-ds query 48:
> {code:java}
> select sum (ss_quantity)
>  from store_sales, store, customer_demographics, customer_address, date_dim
>  where s_store_sk = ss_store_sk
>  and  ss_sold_date_sk = d_date_sk and d_year = 2000
>  and  
>  (
>   (
>    cd_demo_sk = ss_cdemo_sk
>    and 
>    cd_marital_status = 'S'
>    and 
>    cd_education_status = 'Secondary'
>    and 
>    ss_sales_price between 100.00 and 150.00  
>    )
>  or
>   (
>   cd_demo_sk = ss_cdemo_sk
>    and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = 'College'
>    and 
>    ss_sales_price between 50.00 and 100.00   
>   )
>  or 
>  (
>   cd_demo_sk = ss_cdemo_sk
>   and 
>    cd_marital_status = 'U'
>    and 
>    cd_education_status = '2 yr Degree'
>    and 
>    ss_sales_price between 150.00 and 200.00  
>  )
>  )
>  and
>  (
>   (
>   ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('AL', 'OH', 'MD')
>   and ss_net_profit between 0 and 2000  
>   )
>  or
>   (ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('VA', 'TX', 'IA')
>   and ss_net_profit between 150 and 3000 
>   )
>  or
>   (ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('RI', 'WI', 'KY')
>   and ss_net_profit between 50 and 25000 
>   )
>  )
> ;
> {code}
> we can infer two filters from the join or condidtion:
> {code:java}
> for customer_demographics:
> cd_marital_status in(‘D',‘U',‘M') and cd_education_status in('4 yr Degree’,’Secondary’,’Primary')
> for store_sales:
>  (ss_sales_price between 100.00 and 150.00 or ss_sales_price between 50.00 and 100.00
or ss_sales_price between 150.00 and 200.00)
> {code}
> then then we can push down the above two filters to filter  customer_demographics/store_sales.
> A pr will be submit soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message