[ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16587448#comment-16587448
]
Naveen Gangam commented on HIVE-7248:
-------------------------------------
[~navis] looks like the precommit build for the new patch did not run and exited with some
error. Should we retry? Thanks
> UNION ALL in hive returns incorrect results on Hbase backed table
> -----------------------------------------------------------------
>
> Key: HIVE-7248
> URL: https://issues.apache.org/jira/browse/HIVE-7248
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Affects Versions: 0.12.0, 0.13.0, 0.13.1
> Reporter: Mala Chikka Kempanna
> Assignee: Navis
> Priority: Major
> Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt, HIVE-7248.3.patch.txt
>
>
> The issue can be recreated with following steps
> 1) In hbase
> create 'TABLE_EMP','default'
> 2) On hive
> sudo -u hive hive
> CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK
string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping"
= "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", "hbase.scan.cache"
= "500", "hbase.scan.cacheblocks" = "false" ) TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'='');
> 3) On hbase insert the following data
> put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini'
> put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P'
> put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'
> put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind'
> put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K'
> put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'
> 4) On hive execute the following query
> hive
> SELECT *
> FROM (
> SELECT CDS_PK
> FROM TABLE_EMP
> WHERE
> CDS_PK >= '0'
> AND CDS_PK <= '9'
> AND CDS_UPDATED_DATE IS NOT NULL
> UNION ALL SELECT CDS_PK
> FROM TABLE_EMP
> WHERE
> CDS_PK >= 'a'
> AND CDS_PK <= 'z'
> AND CDS_UPDATED_DATE IS NOT NULL
> )t ;
> 5) Output of the query
> 1
> 1
> 2
> 2
> 6) Output of just
> SELECT CDS_PK
> FROM TABLE_EMP
> WHERE
> CDS_PK >= '0'
> AND CDS_PK <= '9'
> AND CDS_UPDATED_DATE IS NOT NULL
> is
> 1
> 2
> 7) Output of just
> SELECT CDS_PK
> FROM TABLE_EMP
> WHERE
> CDS_PK >= 'a'
> AND CDS_PK <= 'z'
> AND CDS_UPDATED_DATE IS NOT NULL
> Empty
> 8) UNION is used to combine the result from multiple SELECT statements into a single
result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not
eliminated
> Accordingly above query should return output
> 1
> 2
> instead it is giving wrong output
> 1
> 1
> 2
> 2
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
|