hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yannik Zuehlke (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11327) HiveQL to HBase - Predicate Pushdown for composite key not working
Date Tue, 21 Jul 2015 15:05:05 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yannik Zuehlke updated HIVE-11327:
----------------------------------
    Tags: hive, predicatepushdown, hbase  (was: hive predicatepushdown)

> HiveQL to HBase - Predicate Pushdown for composite key not working
> ------------------------------------------------------------------
>
>                 Key: HIVE-11327
>                 URL: https://issues.apache.org/jira/browse/HIVE-11327
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler, Hive
>    Affects Versions: 0.14.0
>            Reporter: Yannik Zuehlke
>            Priority: Blocker
>
> I am using Hive 0.14 and Hbase 0.98.8 I would like to use HiveQL for accessing a HBase
"table".
> I created a table with a complex composite rowkey:
> ----
> {quote}
> CREATE EXTERNAL TABLE db.hive_hbase (rowkey struct<p1:string, p2:string, p3:string>,
column1 string, column2 string) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> COLLECTION ITEMS TERMINATED BY ';'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = 
> ":key,cf:c1,cf:c2")
> TBLPROPERTIES("hbase.table.name"="hbase_table");
> {quote}
> ----
> The table is getting successfully created, but the HiveQL query is taking forever:
> ----
> {quote}
> SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz';
> {quote}
> ----
> I am working with 1 TB of data (around 1,5 bn records) and this queries takes forever
(It ran over night, but did not finish in the morning).
> I changed the log4j properties to 'DEBUG' and found some interesting information:
> ----
> {quote}
> 2015-07-15 15:56:41,232 INFO  ppd.OpProcFactory    (OpProcFactory.java:logExpr(823))
- Pushdown Predicates of FIL For Alias : hive_hbase
> 2015-07-15 15:56:41,232 INFO  ppd.OpProcFactory (OpProcFactory.java:logExpr(826)) - 
   (rowkey.p1 = 'xyz')
> {quote}
> ----
> But some lines later:
> ----
> {quote}
> 2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory (OpProcFactory.java:pushFilterToStorageHandler(1051))
- No pushdown possible for predicate:  (rowkey.p1 = 'xyz')
> {quote}
> ----
> So my guess is: HiveQL over HBase does not do any predicate pushdown but starts a MapReduce
job.
> The normal HBase scan (via the HBase Shell) takes around 5 seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message