spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ckhari4u <ckhar...@gmail.com>
Subject Distinct on Map data type -- SPARK-19893
Date Sat, 13 Jan 2018 03:02:03 GMT
I see SPARK-19893 is backported to Spark 2.1 and 2.0.1 as well. I do not see
a clear justification for why SPARK 19893 is important and needed. I have a
sample table which works fine with an earlier build of Spark 2.1.0. Now that
the latest build is having the backport of SPARK-19893, its failing with
error:

Error in query: Cannot have map type columns in DataFrame which calls set
operations(intersect, except, etc.), but the type of column metrics is
map<string,int>;;
Distinct


*In Old Build of Spark 2.1.0, I tried the below:*


create TABLE map_demo2
(
country_id BIGINT,
metrics MAP <STRING, int>
);

insert into table map_demo2 select 2,map("chaka",102) ;
insert into table map_demo2 select 3,map("chaka",102) ;
insert into table map_demo2 select 4,map("mangaa",103) ;


spark-sql> select distinct metrics from map_demo2;
[Stage 0:>                                                          (0 + 4)
/ 5]18/01/12 21:55:41 WARN CryptoStreamUtils: It costs 8501 milliseconds to
create the Initialization Vector used by CryptoStream
18/01/12 21:55:41 WARN CryptoStreamUtils: It costs 8503 milliseconds to
create the Initialization Vector used by CryptoStream
18/01/12 21:55:41 WARN CryptoStreamUtils: It costs 8497 milliseconds to
create the Initialization Vector used by CryptoStream
18/01/12 21:55:41 WARN CryptoStreamUtils: It costs 8496 milliseconds to
create the Initialization Vector used by CryptoStream
[Stage 1:===============================>                       (1[Stage
1:===========================================>           (1[Stage
1:======================================================>(1                           
                                     
{"mangaa":103}
{"chaka":102}
{"chaka":103}
Time taken: 15.331 seconds, Fetched 3 row(s)

Here the simple distinct query works fine in Spark. Any thoughts why
DISTINCT/EXCEPT/INTERSECT operators are not supported on Map data types. 
>From the PR, it says, 
// TODO: although map type is not orderable, technically map type should be
able to be
 +          // used inequality comparison, remove this type check once we
support it.

Could not figure out the issue caused by using the aforementioned operators? 





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message