hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HIVE-13825) Map joins with cloned tables with same locations, but different column names throw error exceptions
Date Mon, 23 May 2016 21:51:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297161#comment-15297161
] 

Sergio Peña commented on HIVE-13825:
------------------------------------

I dig into the code, and found out the problem is when getting the table information from
{{getPathToPartitionInfo}}:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java#L178

{{getPathToPartitionInfo}} is a method from the {{MapWork}} class, and it returns a HashMap
where its key-value information is:
   table-location => table-information

Before getting into {{MapJoinProcessor}}, the HashMap is initialized from the code below where
the {{t1}} table information is overridden by {{t2}} table because they have the same table-location,
and a HashMap cannot store repeated keys:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java#L722

When {{MapJoinProcessor}} is executed, it then wants to get {{t1}} table information using
its table location, but it gets {{t2}} table information instead. So, it throws the exception
posted in this ticket.

> Map joins with cloned tables with same locations, but different column names throw error
exceptions
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13825
>                 URL: https://issues.apache.org/jira/browse/HIVE-13825
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergio Peña
>
> The following scenario of 2 tables with same locations cannot be used on a JOIN query:
> {noformat}
> hive> create table t1 (a string, b string) location '/user/hive/warehouse/test1';
> OK
> hive> create table t2 (c string, d string) location '/user/hive/warehouse/test1';
> OK
> hive> select t1.a from t1 join t2 on t1.a = t2.c;
> ...
> 2016-05-23 16:39:57     Starting to launch local task to process map join;      maximum
memory = 477102080
> Execution failed with exit status: 2
> Obtaining error information
> Task failed!
> Task ID:
>   Stage-4
> Logs:
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
> {noformat}
> The logs contain this error exception:
> {noformat}
> 2016-05-23T16:39:58,163 ERROR [main]: mr.MapredLocalTask (:()) - Hive Runtime Error:
Map local work failed
> java.lang.RuntimeException: cannot find field a from [0:c, 1:d]
>         at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:485)
>         at org.apache.hadoop.hive.serde2.BaseStructObjectInspector.getStructFieldRef(BaseStructObjectInspector.java:133)
>         at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
>         at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:973)
>         at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:999)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:75)
>         at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:355)
>         at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504)
>         at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:457)
>         at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:365)
>         at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504)
>         at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:457)
>         at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:365)
>         at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:499)
>         at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:403)
>         at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:383)
>         at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:751)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message