spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Re: [SQL] Syntax "case when" doesn't be supported in JOIN
Date Thu, 13 Jul 2017 12:25:32 GMT
It’s not about case when, but about rand(). Non-deterministic expressions are not allowed
in join condition.

> On 13 Jul 2017, at 6:43 PM, wangshuang <cn_wss@qq.com> wrote:
> 
> I'm trying to execute hive sql on spark sql (Also on spark thriftserver), For
> optimizing data skew, we use "case when" to handle null.
> Simple sql as following:
> 
> 
> SELECT a.col1 
> FROM tbl1 a 
> LEFT OUTER JOIN tbl2 b 
> ON 
> *	CASE 
> 		WHEN a.col2 IS NULL 
> 			TNEN cast(rand(9)*1000 - 9999999999 as string) 
> 		ELSE 
> 			a.col2 END *
> 	= b.col3;
> 
> 
> But I get the error:
> 
> == Physical Plan ==
> *org.apache.spark.sql.AnalysisException: nondeterministic expressions are
> only allowed in
> Project, Filter, Aggregate or Window, found:*
> (((CASE WHEN (a.`nav_tcdt` IS NULL) THEN CAST(((rand(9) * CAST(1000 AS
> DOUBLE)) - CAST(9999999999L AS DOUBLE)) AS STRING) ELSE a.`nav_tcdt` END =
> c.`site_categ_id`) AND (CAST(a.`nav_tcd` AS INT) = 9)) AND (c.`cur_flag` =
> 1))
> in operator Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN
> cast(((rand(9) * cast(1000 as double)) - cast(9999999999 as double)) as
> string) ELSE nav_tcdt#25 END = site_categ_id#80) && (cast(nav_tcd#26 as int)
> = 9)) && (cur_flag#77 = 1))
>               ;;
> GlobalLimit 10
> +- LocalLimit 10
>   +- Aggregate [date_id#7, CASE WHEN (cast(city_id#10 as string) IN
> (cast(19596 as string),cast(20134 as string),cast(10997 as string)) &&
> nav_tcdt#25 RLIKE ^[0-9]+$) THEN city_id#10 ELSE nav_tpa_id#21 END],
> [date_id#7]
>      +- Filter (date_id#7 = 2017-07-12)
>         +- Join LeftOuter, (((CASE WHEN isnull(nav_tcdt#25) THEN
> cast(((rand(9) * cast(1000 as double)) - cast(9999999999 as double)) as
> string) ELSE nav_tcdt#25 END = site_categ_id#80) && (cast(nav_tcd#26 as int)
> = 9)) && (cur_flag#77 = 1))
>            :- SubqueryAlias a
>            :  +- SubqueryAlias tmp_lifan_trfc_tpa_hive
>            :     +- CatalogRelation `tmp`.`tmp_lifan_trfc_tpa_hive`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [date_id#7, chanl_id#8L,
> pltfm_id#9, city_id#10, sessn_id#11, gu_id#12, nav_refer_page_type_id#13,
> nav_refer_page_value#14, nav_refer_tpa#15, nav_refer_tpa_id#16,
> nav_refer_tpc#17, nav_refer_tpi#18, nav_page_type_id#19, nav_page_value#20,
> nav_tpa_id#21, nav_tpa#22, nav_tpc#23, nav_tpi#24, nav_tcdt#25, nav_tcd#26,
> nav_tci#27, nav_tce#28, detl_refer_page_type_id#29,
> detl_refer_page_value#30, ... 33 more fields]
>            +- SubqueryAlias c
>               +- SubqueryAlias dim_site_categ_ext
>                  +- CatalogRelation `dw`.`dim_site_categ_ext`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [site_categ_skid#64L,
> site_categ_type#65, site_categ_code#66, site_categ_name#67,
> site_categ_parnt_skid#68L, site_categ_kywrd#69, leaf_flg#70L, sort_seq#71L,
> site_categ_srch_name#72, vsbl_flg#73, delet_flag#74, etl_batch_id#75L,
> updt_time#76, cur_flag#77, bkgrnd_categ_skid#78L, bkgrnd_categ_id#79L,
> site_categ_id#80, site_categ_parnt_id#81]
> 
> Does spark sql not support syntax "case when" in JOIN?  Additional, my spark
> version is 2.2.0.
> Any help would be greatly appreciated.
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-Syntax-case-when-doesn-t-be-supported-in-JOIN-tp21953.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message