spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maziyar PANAHI (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26101) Spark Pipe() executes the external app by yarn username not the current username
Date Tue, 12 Feb 2019 21:19:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766471#comment-16766471
] 

Maziyar PANAHI commented on SPARK-26101:
----------------------------------------

I have workaround this issue as I stated here:

[https://stackoverflow.com/a/53395055/1449151]

 

However, I still believe whichever application is asking for YARN container should be responsible
for passing the user as well. In this case, Spark is asking for another container from YARN
not me in a separate application. If Spark YARN containers are running by a user who submitted the
job, RDD.Pipe() should follow the same logic and not expect user's configurations on the
YARN itself.

> Spark Pipe() executes the external app by yarn username not the current username
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-26101
>                 URL: https://issues.apache.org/jira/browse/SPARK-26101
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.3.0
>            Reporter: Maziyar PANAHI
>            Priority: Major
>
> Hello,
> I am using *Spark 2.3.0.cloudera3* on Cloudera cluster. When I start my Spark session
(Zeppelin, Shell, or spark-submit) my real username is being impersonated successfully. That
allows YARN to use the right queue based on the username, also HDFS knows the permissions.
(These all work perfectly without any problem. Meaning the cluster has been set up and configured for
user impersonation)
> Example (running Spark by user panahi with YARN as a master):
> {code:java}
>  
> 18/11/17 13:55:47 INFO spark.SecurityManager: Changing view acls to: panahi
> 18/11/17 13:55:47 INFO spark.SecurityManager: Changing modify acls to: panahi
> 18/11/17 13:55:47 INFO spark.SecurityManager: Changing view acls groups to:
> 18/11/17 13:55:47 INFO spark.SecurityManager: Changing modify acls groups to:
> 18/11/17 13:55:47 INFO spark.SecurityManager: SecurityManager: authentication disabled;
ui acls disabled; users with view permissions: Set(mpanahi); groups with view permissions:
Set();
> users with modify permissions: Set(panahi); groups with modify permissions: Set()
> ...
> 18/11/17 13:55:52 INFO yarn.Client:
> client token: N/A
> diagnostics: N/A
> ApplicationMaster host: N/A
> ApplicationMaster RPC port: -1
> queue: root.multivac
> start time: 1542459353040
> final status: UNDEFINED
> tracking URL: http://hadoop-master-1:8088/proxy/application_1542456252041_0006/
> user: panahi
> {code}
>  
> However, when I use *Spark RDD Pipe()* it is being executed as `*yarn*` user. This makes
it impossible to use an external app such as `c/c++` application that needs read/write access
to HDFS because the user `*yarn*` does not have permissions on the user's directory. (also
other security and resource management issues by executing all the external apps as yarn
username)
> *How to produce this issue:*
> {code:java}
> val test = sc.parallelize(Seq("test user")).repartition(1)
> val piped = test.pipe(Seq("whoami"))
> val c = piped.collect()
> result:
> test: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[26] at repartition at <console>:37
piped: org.apache.spark.rdd.RDD[String] = PipedRDD[27] at pipe at <console>:37 c: Array[String]
= Array(yarn) 
> {code}
>  
> I believe since Spark is the key actor to invoke this execution inside YARN cluster,
Spark needs to respect the actual/current username. Or maybe there is another config for
impersonation between Spark and YARN in this situation, but I haven't found any.
>  
> Many thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message