drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venki Korukanti <venki.koruka...@gmail.com>
Subject Re: DRILL-3290
Date Thu, 27 Aug 2015 16:30:32 GMT
I started looking into this few weeks back, but haven't made much progress
in implementation.

Hive MAP type and Drill MAP type are different. Hive MAP is a pure (key,
value) structure. Drill MAP is more like Hive STRUCT type. Both Hive types
MAP and STRUCT are going to be mapped to Drill MAP type. Hive UNION type is
another one which needs some discussion on how to handle it. Hive LIST type
is straightforward to map to Drill repeated types.

We may not get to work on this in 1.2.0. Please vote on the jira which will
help plan in future releases.

On Thu, Aug 27, 2015 at 7:43 AM, Vince Gonzalez <vince.gonzalez@gmail.com>
wrote:

> Drill 3290 aims to add support for complex Hive types, and looks to me like
> it's targeted for 1.2.0.
>
> The way I'm understanding it, supporting hive complex types means that if I
> create a hive table, stored say as parquet with a MAP column, I should be
> able to query it in Drill in the way we'd expect.
>
> Currently, when I create a Hive table with complex types, Drill fails to
> query the table using the hive plugin because it lacks the support for the
> types.
>
> 0: jdbc:drill:> select * from hive.complex_parquet;
> Error: SYSTEM ERROR: RuntimeException: Unsupported Hive data type LIST.
> Following Hive data types are supported in Drill for querying: BOOLEAN,
> BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, BINARY, DECIMAL,
> STRING, and VARCHAR
>
> Fragment 0:0
>
> [Error Id: f783df3d-7f77-4170-b0e7-aee9ba7d27c7 on ip-172-16-2-200:31010]
> (state=,code=0)
>
>
> I can go around Hive and query the files directly, but the hive-created
> parquet has a schema that's not as intuitive to query:
>
> 0: jdbc:drill:> select * from dfs.`/user/hive/warehouse/complex_parquet`;
>
> +------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
> | firstname  | lastname  |                           children
>             |                                 parents
>            |
>
> +------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
> | Vince      | Gonzalez  |
> {"bag":[{"array_element":"son1"},{"array_element":"son2"}]}  |
> {"map":[{"key":"Mother","value":"mom"},{"key":"Father","value":"dad"}]}  |
>
> +------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
> 1 row selected (0.162 seconds)
>
> Can I interpret "support for Hive complex types" to mean that Drill would
> be able to query the above hive table without having to deal with the "bag"
> and "map" keys?
>
> Can anyone say how likely this is to actually be in 1.2.0?
>
> I put the hive DDL for the above example here:
> https://gist.github.com/vicenteg/d48fb1a9cb70b1b592f4
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message