hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Puneet Jain (JIRA)" <>
Subject [jira] [Commented] (HIVE-18858) System properties in job configuration not resolved when submitting MR job
Date Mon, 06 Aug 2018 08:35:00 GMT


Puneet Jain commented on HIVE-18858:


This seems to have broken working scenarios with Hive MR.  We now see hadoop.tmp.dir is always
set to /tmp/hadoop-hive (in job.xml). This creates problems on a multi-tenant hadoop cluster
since ownership of tmp folder is set to the user who executes the jobs first and other users
fails to write to tmp folder.

E.g. User1 run job and /tmp/hadoop-hive is created on worker node with ownership to user1
and sibsequently user2 tries to run a job and job fails due to no write permission on /tmp/hadoop-hive/

Old behavior allowed multiple tenants to write to their respective tmp folders which was secure
and contention free. User1 - /tmp/hadoop-user1, User2 - /tmp/hadoop-user2.




> System properties in job configuration not resolved when submitting MR job
> --------------------------------------------------------------------------
>                 Key: HIVE-18858
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>         Environment: Hadoop 3.0.0
>            Reporter: Daniel Voros
>            Assignee: Daniel Voros
>            Priority: Major
>             Fix For: 3.0.0
>         Attachments: HIVE-18858.1.patch, HIVE-18858.2.patch, HIVE-18858.3.patch
> Since [this hadoop commit|]
that was first released in 3.0.0, Configuration has a restricted mode, that disables the resolution
of system properties (that happens when retrieving a configuration option).
> This leads to test failures when switching to Hadoop 3.0.0 (instead of 3.0.0-beta1),
since we're relying on the [substitution of test.tmp.dir|]
during the [maven build|].
See test results on HIVE-18327.
> When we're passing job configurations to Hadoop, I believe there's no way to disable
the restricted mode, since we go through some Hadoop MR calls first, see here:
> {code}
> "HiveServer2-Background-Pool: Thread-105@9500" prio=5 tid=0x69 nid=NA runnable
>   java.lang.Thread.State: RUNNABLE
> 	  at org.apache.hadoop.conf.Configuration.addResourceObject(
> 	  - locked <0x2fe6> (a org.apache.hadoop.mapred.JobConf)
> 	  at org.apache.hadoop.conf.Configuration.addResource(
> 	  at org.apache.hadoop.mapred.JobConf.<init>(
> 	  at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(
> 	  at org.apache.hadoop.mapred.LocalJobRunner.submitJob(
> 	  at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(
> 	  at org.apache.hadoop.mapreduce.Job$
> 	  at org.apache.hadoop.mapreduce.Job$
> 	  at
> 	  at
> 	  at
> 	  at org.apache.hadoop.mapreduce.Job.submit(
> 	  at org.apache.hadoop.mapred.JobClient$
> 	  at org.apache.hadoop.mapred.JobClient$
> 	  at
> 	  at
> 	  at
> 	  at org.apache.hadoop.mapred.JobClient.submitJobInternal(
> 	  at org.apache.hadoop.mapred.JobClient.submitJob(
> 	  at
> 	  at
> 	  at org.apache.hadoop.hive.ql.exec.Task.executeTask(
> 	  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
> 	  at org.apache.hadoop.hive.ql.Driver.launchTask(
> 	  at org.apache.hadoop.hive.ql.Driver.execute(
> 	  at org.apache.hadoop.hive.ql.Driver.runInternal(
> 	  at
> 	  at
> 	  at org.apache.hive.service.cli.operation.SQLOperation.runQuery(
> 	  at org.apache.hive.service.cli.operation.SQLOperation.access$700(
> 	  at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$
> 	  at
> 	  at
> 	  at
> 	  at org.apache.hive.service.cli.operation.SQLOperation$
> 	  at java.util.concurrent.Executors$
> 	  at
> 	  at java.util.concurrent.ThreadPoolExecutor.runWorker(
> 	  at java.util.concurrent.ThreadPoolExecutor$
> 	  at
> {code}
> I suggest to resolve all variables before passing the configuration to Hadoop in ExecDriver.

This message was sent by Atlassian JIRA

View raw message