drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Altekruse (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-649) Unable to read dictionary encoded parquet file generated from impala or avro
Date Mon, 12 May 2014 16:33:16 GMT

    [ https://issues.apache.org/jira/browse/DRILL-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995219#comment-13995219
] 

Jason Altekruse commented on DRILL-649:
---------------------------------------

Sorry I didn't get back to you last week, we will still be doing vector copies for non-dictionary
encoded values. One frustrating thing is that much of the reason we are running into such
frequent problems is we are running tests on tiny files, so everything has a small enough
set of values that it can be more efficiently stored as a dictionary. I don't know common
use cases, but I think with 1 gig row groups it will actually be fairly rare that an int or
long column will be limited to a dictionary of 50,000 values.

> Unable to read dictionary encoded parquet file generated from impala or avro
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-649
>                 URL: https://issues.apache.org/jira/browse/DRILL-649
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Steven Phillips
>            Assignee: Jason Altekruse
>         Attachments: nation.parquet
>
>
> support for dictionary encoding was recently added, but it looks like some dictionary
encoded files are still unreadable by drill. For example, the parquet file created from an
avro file attached to DRILL-389 still fails.
> I also created a simple parquet file from impala, which also fails to read.
> I will attach the file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message