spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-12998) Enable OrcRelation when connecting via spark thrift server
Date Wed, 27 Jan 2016 02:24:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-12998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118486#comment-15118486
] 

Apache Spark commented on SPARK-12998:
--------------------------------------

User 'rajeshbalamohan' has created a pull request for this issue:
https://github.com/apache/spark/pull/10938

> Enable OrcRelation when connecting via spark thrift server
> ----------------------------------------------------------
>
>                 Key: SPARK-12998
>                 URL: https://issues.apache.org/jira/browse/SPARK-12998
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Rajesh Balamohan
>
> When a user connects via spark-thrift server to execute SQL, it does not enable PPD with
ORC. It ends up creating MetastoreRelation which does not have ORC PPD.  Purpose of this JIRA
is to convert MetastoreRelation to OrcRelation in HiveMetastoreCatalog, so that users can
benefit from PPD even when connecting to spark-thrift server.
> {noformat}
> For example, "explain select count(1) from  tpch_flat_orc_1000.lineitem where l_shipdate
= '1990-04-18'", current plan is 
> +------------------------------------------------------------------------------------------------------------------+--+
> |                                                       plan                        
                              |
> +------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==                                                               
                              |
> | TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#17L])
                 |
> | +- Exchange SinglePartition, None                                                 
                              |
> |    +- WholeStageCodegen                                                           
                              |
> |       :  +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)],
output=[count#20L])  |
> |       :     +- Project                                                            
                              |
> |       :        +- Filter (l_shipdate#11 = 1990-04-18)                             
                              |
> |       :           +- INPUT                                                        
                              |
> |       +- HiveTableScan [l_shipdate#11], MetastoreRelation tpch_1000, lineitem, None
                    |
> +------------------------------------------------------------------------------------------------------------------+--+
> It would be good to change it to OrcRelation to do PPD with ORC, which reduces the runtime
by large margin.
>  
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |                                                                                   
         plan                                                                            
                 |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==                                                               
                                                                                         
                 |
> | TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#70L])
                                                                                         
    |
> | +- Exchange SinglePartition, None                                                 
                                                                                         
                 |
> |    +- WholeStageCodegen                                                           
                                                                                         
                 |
> |       :  +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)],
output=[count#106L])                                                                     
        |
> |       :     +- Project                                                            
                                                                                         
                 |
> |       :        +- Filter (_col10#64 = 1990-04-18)                                 
                                                                                         
                 |
> |       :           +- INPUT                                                        
                                                                                         
                 |
> |       +- Scan OrcRelation[_col10#64] InputPaths: hdfs://nn:8020/apps/hive/warehouse/tpch_1000.db/lineitem,
PushedFilters: [EqualTo(_col10,1990-04-18)]  |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message