hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-19103) Nested structure Projection Push Down in Hive with ORC
Date Wed, 24 Jun 2020 00:27:02 GMT

     [ https://issues.apache.org/jira/browse/HIVE-19103?focusedWorklogId=450124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450124
]

ASF GitHub Bot logged work on HIVE-19103:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Jun/20 00:26
            Start Date: 24/Jun/20 00:26
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] closed pull request #330:
URL: https://github.com/apache/hive/pull/330


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 450124)
    Time Spent: 20m  (was: 10m)

> Nested structure Projection Push Down in Hive with ORC
> ------------------------------------------------------
>
>                 Key: HIVE-19103
>                 URL: https://issues.apache.org/jira/browse/HIVE-19103
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive, ORC
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: HIVE-19103.2.patch, HIVE-19103.3.patch, HIVE-19103.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Reading required columns only in nested structure schema
> Example - 
> *Current state* - 
> Schema  -  struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
> Query - select c.e.f from t where c.e.f > 10;
> Current state - read entire c struct from the file and then filter because "hive.io.file.readcolumn.ids"
is referred due to which all the children column are select to read from the file.
> Conf -
>      _hive.io.file.readcolumn.ids  = "2"
>      hive.io.file.readNestedColumn.paths = "c.e.f"_
> Result -       
> boolean[ ] include  = [true,false,false,true,true,true,true,true]
> *Expected state* -
> Schema  -  struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
> Query - select c.e.f from t where c.e.f > 10;
> Expected state - instead of reading entire c struct from the file just read only the
f column by referring the  " hive.io.file.readNestedColumn.paths".
> Conf -
>      _hive.io.file.readcolumn.ids  = "2"
>      hive.io.file.readNestedColumn.paths = "c.e.f"_
> Result -       
> boolean[ ] include  = [true,false,false,true,false,true,true,false]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message