drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vince Gonzalez <vince.gonza...@gmail.com>
Subject DRILL-3290
Date Thu, 27 Aug 2015 14:43:46 GMT
Drill 3290 aims to add support for complex Hive types, and looks to me like
it's targeted for 1.2.0.

The way I'm understanding it, supporting hive complex types means that if I
create a hive table, stored say as parquet with a MAP column, I should be
able to query it in Drill in the way we'd expect.

Currently, when I create a Hive table with complex types, Drill fails to
query the table using the hive plugin because it lacks the support for the
types.

0: jdbc:drill:> select * from hive.complex_parquet;
Error: SYSTEM ERROR: RuntimeException: Unsupported Hive data type LIST.
Following Hive data types are supported in Drill for querying: BOOLEAN,
BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, BINARY, DECIMAL,
STRING, and VARCHAR

Fragment 0:0

[Error Id: f783df3d-7f77-4170-b0e7-aee9ba7d27c7 on ip-172-16-2-200:31010]
(state=,code=0)


I can go around Hive and query the files directly, but the hive-created
parquet has a schema that's not as intuitive to query:

0: jdbc:drill:> select * from dfs.`/user/hive/warehouse/complex_parquet`;
+------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
| firstname  | lastname  |                           children
            |                                 parents
           |
+------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
| Vince      | Gonzalez  |
{"bag":[{"array_element":"son1"},{"array_element":"son2"}]}  |
{"map":[{"key":"Mother","value":"mom"},{"key":"Father","value":"dad"}]}  |
+------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
1 row selected (0.162 seconds)

Can I interpret "support for Hive complex types" to mean that Drill would
be able to query the above hive table without having to deal with the "bag"
and "map" keys?

Can anyone say how likely this is to actually be in 1.2.0?

I put the hive DDL for the above example here:
https://gist.github.com/vicenteg/d48fb1a9cb70b1b592f4

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message