hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ferdinand Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17696) Vectorized reader does not seem to be pushing down projection columns in certain code paths
Date Wed, 25 Oct 2017 00:49:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16217937#comment-16217937
] 

Ferdinand Xu commented on HIVE-17696:
-------------------------------------

Two changes here:
* DataWritableReadSupport did two things in its init method. 1) create request schema 2) create
meta data. Vectorized Reader only need part one.
* DataWritableReadSupport supported nested pruning filter while vectorization path still has
some issues which leads qtest failed. So I disabled it in the 2nd patch.

> Vectorized reader does not seem to be pushing down projection columns in certain code
paths
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17696
>                 URL: https://issues.apache.org/jira/browse/HIVE-17696
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Vihang Karajgaonkar
>            Assignee: Ferdinand Xu
>         Attachments: HIVE-17696.2.patch, HIVE-17696.patch
>
>
> This is the code snippet from {{VectorizedParquetRecordReader.java}}
> {noformat}
> MessageType tableSchema;
>     if (indexAccess) {
>       List<Integer> indexSequence = new ArrayList<>();
>       // Generates a sequence list of indexes
>       for(int i = 0; i < columnNamesList.size(); i++) {
>         indexSequence.add(i);
>       }
>       tableSchema = DataWritableReadSupport.getSchemaByIndex(fileSchema, columnNamesList,
>         indexSequence);
>     } else {
>       tableSchema = DataWritableReadSupport.getSchemaByName(fileSchema, columnNamesList,
>         columnTypesList);
>     }
>     indexColumnsWanted = ColumnProjectionUtils.getReadColumnIDs(configuration);
>     if (!ColumnProjectionUtils.isReadAllColumns(configuration) && !indexColumnsWanted.isEmpty())
{
>       requestedSchema =
>         DataWritableReadSupport.getSchemaByIndex(tableSchema, columnNamesList, indexColumnsWanted);
>     } else {
>       requestedSchema = fileSchema;
>     }
>     this.reader = new ParquetFileReader(
>       configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns());
> {noformat}
> Couple of things to notice here:
> Most of this code is duplicated from {{DataWritableReadSupport.init()}} method. 
> the else condition passes in fileSchema instead of using tableSchema like we do in DataWritableReadSupport.init()
method. Does this cause projection columns to be missed when we read parquet files? We should
probably just reuse ReadContext returned from {{DataWritableReadSupport.init()}} method here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message