hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <>
Subject [jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark
Date Tue, 13 Dec 2016 06:45:58 GMT


Rui Li commented on HIVE-13278:

Hi [~xuefuz], the conclusion is we somehow try to read reduce.xml for map-only job, and yes
it happens to MR as well. The call path is {{HiveOutputFormatImpl.checkOutputSpecs -> Utilities.getMapRedWork}}.
The reason why HiveOutputFormatImpl needs to get the MapRedWork is it needs to do some check
on all the FS operators. Since FS only exists at the end of a job, my suggestion is we firstly
try to get MapWork. If the MapWork has an FS in it, it means this is a map-only job so we
don't have to look for ReduceWork. But [~stakiar] found that some map-only job may not have
FS in the MapWork, e.g. {{ANALYZE TABLE}}. To have a complete fix, we'll need some flag in
the JobConf indicating if this is map-only. Or we can use my solution, which solves the issue
for most cases.

Some special handling for HoS may be needed. For HoS, each map.xml and reduce.xml resides
in a different path. We can use {{}} to determine whether the JobConf is
for MapWork or ReduceWork. And then call getMapWork or getReduceWork respectively.

> Many redundant 'File not found' messages appeared in container log during query execution
with Hive on Spark
> ------------------------------------------------------------------------------------------------------------
>                 Key: HIVE-13278
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>         Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>            Reporter: Xin Hao
>            Assignee: Sahil Takiar
>            Priority: Minor
> Many redundant 'File not found' messages appeared in container log during query execution
with Hive on Spark.
> Certainly, it doesn't prevent the query from running successfully. So mark it as Minor
> Error message example:
> {noformat}
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(
>         at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
>         at org.apache.hadoop.ipc.RPC$
>         at org.apache.hadoop.ipc.Server$Handler$
>         at org.apache.hadoop.ipc.Server$Handler$
>         at Method)
>         at
>         at
>         at org.apache.hadoop.ipc.Server$
> {noformat}

This message was sent by Atlassian JIRA

View raw message