drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Phillips <sphill...@maprtech.com>
Subject Re: Parquet File Weirdness
Date Fri, 03 Apr 2015 20:23:07 GMT
Parquet has a few primitive types, one of which is Binary array. These
primitive types are used to store different "converted types". For example,
one of the converted types that uses binary array is "UTF8" string. I
believe that the parquet files you are querying do not have the "converted
type" set for the columns, so Drill does not know how to interpret the
columns. So it treats them as "VARBINARY", and not converting them to
VARCHAR. In hive, the fact that they represent strings is stored in the
metastore, so they dont have this problem.

To display the data correctly in drill, you'll need to cast them as
varchar, e.g.:

select cast(column as varchar(255)) ...

On Fri, Apr 3, 2015 at 12:34 PM, Andries Engelbrecht <
aengelbrecht@maprtech.com> wrote:

> Are you reading the data using the Hive Storage plugin for Drill and using
> the Metastore, or are you directly querying the parquet files on the
> filesystem with Drill?
>
>
> —Andries
>
>
> On Apr 3, 2015, at 12:05 PM, John Omernik <john@omernik.com> wrote:
>
> > I have a table in Hive (no partitions, single level, stored as PARQUET
> > (hive-0.13).  When I query it in hive, it works fine, when I run a
> > count(*) on it drill it works (fast) but when I run a query, it seems
> > to return the same number of results, but it  look likes this...
> > thoughts?  (These should be strings with emails, domains, etc)
> >
> >
> >
> >
> >
> > [B@4d8c55fe | [B@3861be78 | [B@191fd533 | [B@78e61427 | [B@49354a73 |
> > [B@49aae991 |
>
>


-- 
 Steven Phillips
 Software Engineer

 mapr.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message