hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]
Date Thu, 14 May 2015 15:43:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543878#comment-14543878
] 

Sergio Peña commented on HIVE-9863:
-----------------------------------

I executed the following commands through the TestSparkCliDriver in 1.3.0 (with parquet 1.6.0)
and 1.1.0 (with parquet 1.6.0rc3), and both versions are working correctly. I cannot reproduce
the issue yet. Which version are you running?

{noformat}
set hive.compute.query.using.stats=false;

create table text(key int, value string);
load data local inpath '/opt/local/hive/upstream/data/files/kv1.txt' overwrite into table
text;

create table parquet(key int, value string) stored as parquet;
insert overwrite table parquet select * from text;

select count(*) from parquet;
select * from parquet limit 2;
{noformat}



> Querying parquet tables fails with IllegalStateException [Spark Branch]
> -----------------------------------------------------------------------
>
>                 Key: HIVE-9863
>                 URL: https://issues.apache.org/jira/browse/HIVE-9863
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) from table_name
fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: All the offsets
listed in the split should be found in the file. expected: [4, 4] found: [BlockMetaData{69644,
881917418 [ColumnMetaData{GZIP [guid] BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP
[collection_name] BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type]
BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] INT64  [PLAIN_DICTIONARY,
BIT_PACKED], 389887}, ColumnMetaData{GZIP [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY,
BIT_PACKED], 397673}, ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED],
422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 460215},
ColumnMetaData{GZIP [content_size] INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP
[source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP [delete_flag] BOOLEAN
 [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP [meta] BINARY  [RLE, PLAIN, BIT_PACKED],
683834}, ColumnMetaData{GZIP [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of:
[4, 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems happening to
MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message