hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-11102) ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
Date Wed, 01 Jul 2015 18:47:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610808#comment-14610808
] 

Prasanth Jayachandran commented on HIVE-11102:
----------------------------------------------

[~sershe] and [~gopalv].. getRawDataSizeOfColumns was never intended to be used inside hive
at the time of writing. Its added as a pure convenience method for tools using ORC outside
of hive like pig et. al. The reason being all other tools will write the actual column names
but hive writes internal names which is weird. Hive uses getRawDataSizeFromColIndices method
for getting the raw data size of projected columns (used by ANALYZE and StatsTask). I am going
to put up another patch for uncompressed size in ORC split which will not use the getRawDataSizeOfColumns
interface. The reason currently we are seeing this logs is because of this line in OrcInputFormat

{code}
List<String> projCols = ColumnProjectionUtils.getReadColumnNames(context.conf);
{code}

This is actually a dead code which does not do any thing. So its safe to ignore these warnings
for now.

> ReaderImpl: getColumnIndicesFromNames does not work for ACID tables
> -------------------------------------------------------------------
>
>                 Key: HIVE-11102
>                 URL: https://issues.apache.org/jira/browse/HIVE-11102
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 1.3.0, 1.2.1, 2.0.0
>            Reporter: Gopal V
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-11102.patch
>
>
> ORC reader impl does not estimate the size of ACID data files correctly.
> {code}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0
> 	at java.util.Collections$EmptyList.get(Collections.java:3212)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
> 	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
> 	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message