spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From StanZhai <m...@stanzhai.site>
Subject Re:[SQL] Syntax "case when" doesn't be supported in JOIN
Date Fri, 14 Jul 2017 04:00:44 GMT
A workaround is diffcult.
You should consider merging this PR <https://github.com/apache/spark/pull/10128> into
your Spark.







 
 "wangshuang [via Apache Spark Developers List]"<ml+s1001551n21953h3@n3.nabble.com>
wroted at 2017-07-13 18:43:

I'm trying to execute hive sql on spark sql (Also on spark thriftserver), For optimizing data
skew, we use "case when" to handle null.
Simple sql as following:


SELECT a.col1
 FROM tbl1 a
 LEFT OUTER JOIN tbl2 b
 ON
         CASE
                 WHEN a.col2 IS NULL
                         TNEN cast(rand(9)*1000 - 9999999999 as string)
                 ELSE
                         a.col2 END
         = b.col3;


But I get the error:

== Physical Plan ==
org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:
 (((CASE WHEN (a.`nav_tcdt` IS NULL) THEN CAST(((rand(9) * CAST(1000 AS DOUBLE)) - CAST(9999999999L
AS DOUBLE)) AS STRING) ELSE a.`nav_tcdt` END = c.`site_categ_id`) AND (CAST(a.`nav_tcd` AS
INT) = 9)) AND (c.`cur_flag` = 1))
in operator Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN cast(((rand(9) * cast(1000
as double)) - cast(9999999999 as double)) as string) ELSE nav_tcdt#25 END = site_categ_id#80)
&& (cast(nav_tcd#26 as int) = 9)) && (cur_flag#77 = 1))
               ;;
GlobalLimit 10
+- LocalLimit 10
   +- Aggregate [date_id#7, CASE WHEN (cast(city_id#10 as string) IN (cast(19596 as string),cast(20134
as string),cast(10997 as string)) && nav_tcdt#25 RLIKE ^[0-9]+$) THEN city_id#10 ELSE
nav_tpa_id#21 END], [date_id#7]
      +- Filter (date_id#7 = 2017-07-12)
         +- Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN cast(((rand(9) * cast(1000
as double)) - cast(9999999999 as double)) as string) ELSE nav_tcdt#25 END = site_categ_id#80)
&& (cast(nav_tcd#26 as int) = 9)) && (cur_flag#77 = 1))
            :- SubqueryAlias a
            :  +- SubqueryAlias tmp_lifan_trfc_tpa_hive
            :     +- CatalogRelation `tmp`.`tmp_lifan_trfc_tpa_hive`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[date_id#7, chanl_id#8L, pltfm_id#9, city_id#10, sessn_id#11, gu_id#12, nav_refer_page_type_id#13,
nav_refer_page_value#14, nav_refer_tpa#15, nav_refer_tpa_id#16, nav_refer_tpc#17, nav_refer_tpi#18,
nav_page_type_id#19, nav_page_value#20, nav_tpa_id#21, nav_tpa#22, nav_tpc#23, nav_tpi#24,
nav_tcdt#25, nav_tcd#26, nav_tci#27, nav_tce#28, detl_refer_page_type_id#29, detl_refer_page_value#30,
... 33 more fields]
            +- SubqueryAlias c
               +- SubqueryAlias dim_site_categ_ext
                  +- CatalogRelation `dw`.`dim_site_categ_ext`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[site_categ_skid#64L, site_categ_type#65, site_categ_code#66, site_categ_name#67, site_categ_parnt_skid#68L,
site_categ_kywrd#69, leaf_flg#70L, sort_seq#71L, site_categ_srch_name#72, vsbl_flg#73, delet_flag#74,
etl_batch_id#75L, updt_time#76, cur_flag#77, bkgrnd_categ_skid#78L, bkgrnd_categ_id#79L, site_categ_id#80,
site_categ_parnt_id#81]

Does spark sql not support syntax "case when" in JOIN?  Additional, my spark version is 2.2.0.
Any help would be greatly appreciated.

If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-Syntax-case-when-doesn-t-be-supported-in-JOIN-tp21953.html
To start a new topic under Apache Spark Developers List, email ml+s1001551n1h91@n3.nabble.com
 To unsubscribe from Apache Spark Developers List, click here.
 NAML







--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-SQL-Syntax-case-when-doesn-t-be-supported-in-JOIN-tp21960.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Mime
View raw message