hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19103) Reading required column only in nested structure schema in ORC
Date Wed, 04 Apr 2018 08:47:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425198#comment-16425198
] 

ASF GitHub Bot commented on HIVE-19103:
---------------------------------------

GitHub user ashish-kumar-sharma opened a pull request:

    https://github.com/apache/hive/pull/330

     HIVE-19103: Reading required column only in nested structure schema in ORC

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Flipkart/hive requiredColumn

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hive/pull/330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #330
    
----
commit c7addea2d30af50bbee37665964eb60c789ed63b
Author: Aashish Kumar Sharma <aashish.s@...>
Date:   2018-04-04T08:18:03Z

    HIVE-19103: first commit

commit 386cdb6292f0459e10e0d8473dd3b4b77002e334
Author: Aashish Kumar Sharma <aashish.s@...>
Date:   2018-04-04T08:45:13Z

    HIVE-19103: second commit

----


> Reading required column only in nested structure schema in ORC
> --------------------------------------------------------------
>
>                 Key: HIVE-19103
>                 URL: https://issues.apache.org/jira/browse/HIVE-19103
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Major
>              Labels: pull-request-available
>
> Reading required columns only in nested structure schema
> Example - 
> *Current state* - 
> Schema  -  struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
> Query - select c.e.f from t where c.e.f > 10;
> Current state - read entire c struct from the file and then filter because "hive.io.file.readcolumn.ids"
is referred due to which all the children column are select to read from the file.
> Conf -
>      _hive.io.file.readcolumn.ids  = "2"
>      hive.io.file.readNestedColumn.paths = "c.e.f"_
> Result -       
> boolean[ ] include  = [true,false,false,true,true,true,true,true]
> *Expected state* -
> Schema  -  struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
> Query - select c.e.f from t where c.e.f > 10;
> Expected state - instead of reading entire c struct from the file just read only the
f column by referring the  " hive.io.file.readNestedColumn.paths".
> Conf -
>      _hive.io.file.readcolumn.ids  = "2"
>      hive.io.file.readNestedColumn.paths = "c.e.f"_
> Result -       
> boolean[ ] include  = [true,false,false,true,false,true,true,false]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message