drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "qihuang.zheng"<qihuang.zh...@fraudmetrix.cn>
Subject Reading Parquet's map column
Date Mon, 10 Aug 2015 09:43:07 GMT
Hi Driller:
 I get hive_alltypes.parquethere:https://issues.apache.org/jira/browse/DRILL-2005. I create
table on hive and query it:
hive desc alltypesparquet;
OK
c1          int
c2          boolean
c3          double
c4          string
c5          arrayint
c6          mapint,string
c7          mapstring,string
...
hive select c6 from alltypesparquet;
{1:"x",2:"y"}


and I can easily get k,v just in one row:
hive select c6[1],c6[2] from alltypesparquet;


In Drill, I query like this:
0: jdbc:drill:zk=local select t.c6 from dfs.`/home/qihuang.zheng/hive_alltypes.parquet` t;
{"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]}


Not only the structure changed, but also String value x,y to eA==,eQ==.
structure: t.c6.map is now an array. so I can't query like : t.c6.key now.
I should :t.c6.map[0].key, But since I should get all key, not the first one.
the solution I can figure now is use flatten:


0: jdbc:drill:zk=local select tb.flat.key,tb.flat.`value` from(select flatten(t.c6.map) flat
from dfs.`/home/qihuang.zheng/hive_alltypes.parquet` t ) tb;
+---------+--------------+
| EXPR$0 |  EXPR$1  |
+---------+--------------+
| 1    | [B@2cf5c838 |
| 2    | [B@3c2beb97 |


I looks like so complicate, and now One Row to Two Row, and then I should use SQL's RowToColumn
to make
the result to Only One Row, Complicatedddd too much.


Anyone has good solution? And Why Drill's map structure is different with Hive?
Tks!






qihuang.zheng
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message