spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Tenenbaum <matt.tenenb...@rockyou.com>
Subject Re: spark-shell with different username
Date Sat, 02 Apr 2016 17:09:34 GMT
Hello Sebastian —

When I add that parameter, it fails to get to the point where it can
communicate with the resource manager. I see an endless series of
ConnectException, with message and stack trace that looks like this

16/04/02 09:59:45 INFO client.ConfiguredRMFailoverProxyProvider:
Failing over to rm236
16/04/02 09:59:45 INFO retry.RetryInvocationHandler: Exception while
invoking getClusterMetrics of class
ApplicationClientProtocolPBClientImpl over rm236 after 62 fail over
attempts. Trying to fail over immediately.
16/04/02 09:59:45 INFO client.ConfiguredRMFailoverProxyProvider:
Failing over to rm238
16/04/02 09:59:46 INFO retry.RetryInvocationHandler: Exception while
invoking getClusterMetrics of class
ApplicationClientProtocolPBClientImpl over rm238 after 63 fail over
attempts. Trying to fail over immediately.
java.net.ConnectException: Call From laptop.local/192.168.1.112 to
resource-manager:8032 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
    at org.apache.hadoop.ipc.Client.call(Client.java:1472)
    at org.apache.hadoop.ipc.Client.call(Client.java:1399)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
    at com.sun.proxy.$Proxy15.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:202)
    at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:461)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:129)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:129)
    at org.apache.spark.Logging$class.logInfo(Logging.scala:58)
    at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:62)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:128)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
    at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
    at $line3.$read$$iwC$$iwC.<init>(<console>:15)
    at $line3.$read$$iwC.<init>(<console>:24)
    at $line3.$read.<init>(<console>:26)
    at $line3.$read$.<init>(<console>:30)
    at $line3.$read$.<clinit>(<console>)
    at $line3.$eval$.<init>(<console>:7)
    at $line3.$eval$.<clinit>(<console>)
    at $line3.$eval.$print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
    at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
    at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
    at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
    at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
    at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
    at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
    at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
    at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:161)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
    at org.apache.hadoop.ipc.Client.call(Client.java:1438)
    ... 72 more

On Sat, Apr 2, 2016 at 3:03 AM, Sebastian YEPES FERNANDEZ <syepes@gmail.com>
wrote:

Matt, have you tried using the parameter  --*proxy*-*user* matt
> On Apr 2, 2016 8:17 AM, "Mich Talebzadeh" <mich.talebzadeh@gmail.com>
> wrote:
>
>> Matt,
>>
>> What OS are you using on your laptop? Sounds like Ubuntu or something?
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 2 April 2016 at 01:17, Matt Tenenbaum <matt.tenenbaum@rockyou.com>
>> wrote:
>>
>>> Hello all —
>>>
>>> tl;dr: I’m having an issue running spark-shell from my laptop (or other
>>> non-cluster-affiliated machine), and I think the issue boils down to
>>> usernames. Can I convince spark/scala that I’m someone other than $USER?
>>>
>>> A bit of background: our cluster is CDH 5.4.8, installed with Cloudera
>>> Manager 5.5. We use LDAP, and my login on all hadoop-affiliated machines
>>> (including the gateway boxes we use for running scheduled work) is
>>> ‘matt.tenenbaum’. When I run spark-shell on one of those machines,
>>> everything is fine:
>>>
>>> [matt.tenenbaum@remote-machine ~]$ HADOOP_CONF_DIR=/etc/hadoop/conf SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>>
>>> Everything starts up correctly, I get a scala prompt, the SparkContext
>>> and SQL context are correctly initialized, and I’m off to the races:
>>>
>>> 16/04/01 23:27:00 INFO session.SessionState: Created local directory: /tmp/35b58974-dad5-43c6-9864-43815d101ca0_resources
>>> 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
>>> 16/04/01 23:27:00 INFO session.SessionState: Created local directory: /tmp/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0
>>> 16/04/01 23:27:00 INFO session.SessionState: Created HDFS directory: /tmp/hive/matt.tenenbaum/35b58974-dad5-43c6-9864-43815d101ca0/_tmp_space.db
>>> 16/04/01 23:27:00 INFO repl.SparkILoop: Created sql context (with Hive support)..
>>> SQL context available as sqlContext.
>>>
>>> scala> 1 + 41
>>> res0: Int = 42
>>>
>>> scala> sc
>>> res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@4e9bd2c8
>>>
>>> I am running 1.6 from a downloaded tgz file, rather than the spark-shell
>>> made available to the cluster from CDH. I can copy that tgz to my laptop,
>>> and grab a copy of the cluster configurations, and in a perfect world I
>>> would then be able to run everything in the same way
>>>
>>> [matt@laptop ~]$ HADOOP_CONF_DIR=path/to/hadoop/conf SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>>
>>> Notice there are two things that are different:
>>>
>>>    1. My local username on my laptop is ‘matt’, which does not match my
>>>    name on the remote machine.
>>>    2. The Hadoop configs live somewhere other than /etc/hadoop/conf
>>>
>>> Alas, #1 proves fatal because of cluster permissions (there is no
>>> /user/matt/ in HDFS, and ‘matt’ is not a valid LDAP user). In the
>>> initialization logging output, I can see that fail in an expected way:
>>>
>>> 16/04/01 16:37:19 INFO yarn.Client: Setting up container launch context for our
AM
>>> 16/04/01 16:37:19 INFO yarn.Client: Setting up the launch environment for our
AM container
>>> 16/04/01 16:37:19 INFO yarn.Client: Preparing resources for our AM container
>>> 16/04/01 16:37:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
>>> 16/04/01 16:37:21 ERROR spark.SparkContext: Error initializing SparkContext.
>>> org.apache.hadoop.security.AccessControlException: Permission denied: user=matt,
access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
>>>     at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
>>>     at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
>>>     at (... etc ...)
>>>
>>> Fine. In other circumstances I’ve told Hadoop explicitly who I am by
>>> setting HADOOP_USER_NAME. Maybe that works here?
>>>
>>> [matt@laptop ~]$ HADOOP_USER_NAME=matt.tenenbaum HADOOP_CONF_DIR=soma-conf SPARK_HOME=spark-1.6.0-bin-hadoop2.6
spark-1.6.0-bin-hadoop2.6/bin/spark-shell --master yarn --deploy-mode client
>>>
>>> Eventually that fails too, but not for the same reason. Setting
>>> HADOOP_USER_NAME is sufficient to allow initialization to get past the
>>> access-control problems, and I can see it request a new application from
>>> the cluster
>>>
>>> 16/04/01 16:43:08 INFO yarn.Client: Will allocate AM container, with 896 MB memory
including 384 MB overhead
>>> 16/04/01 16:43:08 INFO yarn.Client: Setting up container launch context for our
AM
>>> 16/04/01 16:43:08 INFO yarn.Client: Setting up the launch environment for our
AM container
>>> 16/04/01 16:43:08 INFO yarn.Client: Preparing resources for our AM container
>>> ... [resource uploads happen here] ...
>>> 16/04/01 16:46:16 INFO spark.SecurityManager: Changing view acls to: matt,matt.tenenbaum
>>> 16/04/01 16:46:16 INFO spark.SecurityManager: Changing modify acls to: matt,matt.tenenbaum
>>> 16/04/01 16:46:16 INFO spark.SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(matt, matt.tenenbaum); users
with modify permissions: Set(matt, matt.tenenbaum)
>>> 16/04/01 16:46:16 INFO yarn.Client: Submitting application 30965 to ResourceManager
>>> 16/04/01 16:46:16 INFO impl.YarnClientImpl: Submitted application application_1451332794331_30965
>>> 16/04/01 16:46:17 INFO yarn.Client: Application report for application_1451332794331_30965
(state: ACCEPTED)
>>> 16/04/01 16:46:17 INFO yarn.Client:
>>>      client token: N/A
>>>      diagnostics: N/A
>>>      ApplicationMaster host: N/A
>>>      ApplicationMaster RPC port: -1
>>>      queue: root.matt_dot_tenenbaum
>>>      start time: 1459554373844
>>>      final status: UNDEFINED
>>>      tracking URL: http://resource-manager:8088/proxy/application_1451332794331_30965/
>>>      user: matt.tenenbaum
>>> 16/04/01 16:46:19 INFO yarn.Client: Application report for application_1451332794331_30965
(state: ACCEPTED)
>>>
>>> but this AM never switches state from ACCEPTED to RUNNING. Eventually it
>>> times out and kills the AM
>>>
>>> 16/04/01 16:50:14 INFO yarn.Client: Application report for application_1451332794331_30965
(state: FAILED)
>>> 16/04/01 16:50:14 INFO yarn.Client:
>>>      client token: N/A
>>>      diagnostics: Application application_1451332794331_30965 failed 2 times
due to AM Container for appattempt_1451332794331_30965_000002 exited with  exitCode: 10
>>> For more detailed output, check application tracking page:http://resource-manager:8088/proxy/application_1451332794331_30965/Then,
click on links to logs of each attempt.
>>> Diagnostics: Exception from container-launch.
>>> Container id: container_e43_1451332794331_30965_02_000001
>>> Exit code: 10
>>> Stack trace: ExitCodeException exitCode=10:
>>>     at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>>>     at org.apache.hadoop.util.Shell.run(Shell.java:460)
>>>     at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>>>     at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293)
>>>     at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>>>     at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>     at java.lang.Thread.run(Thread.java:745)
>>>
>>> Shell output: main : command provided 1
>>> main : user is yarn
>>> main : requested yarn user is matt.tenenbaum
>>>
>>> Container exited with a non-zero exit code 10
>>> Failing this attempt. Failing the application.
>>>      ApplicationMaster host: N/A
>>>      ApplicationMaster RPC port: -1
>>>      queue: root.matt_dot_tenenbaum
>>>      start time: 1459554373844
>>>      final status: FAILED
>>>      tracking URL: http://resource-manager:8088/cluster/app/application_1451332794331_30965
>>>      user: matt.tenenbaum
>>> 16/04/01 16:50:15 ERROR spark.SparkContext: Error initializing SparkContext.
>>> org.apache.spark.SparkException: Yarn application has already ended! It might
have been killed or unable to launch application master.
>>>     at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:124)
>>>     at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:64)
>>>     at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
>>>     at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
>>>     at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
>>>     at $line3.$read$iwC$iwC.<init>(<console>:15)
>>>     at $line3.$read$iwC.<init>(<console>:24)
>>>     at $line3.$read.<init>(<console>:26)
>>>     at $line3.$read$.<init>(<console>:30)
>>>     at $line3.$read$.<clinit>(<console>)
>>>     at $line3.$eval$.<init>(<console>:7)
>>>     at $line3.$eval$.<clinit>(<console>)
>>>     at $line3.$eval.$print(<console>)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>     at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>>     at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>>>     at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>>     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>>     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>>     at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>>     at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>>     at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>>     at org.apache.spark.repl.SparkILoopInit$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:125)
>>>     at org.apache.spark.repl.SparkILoopInit$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
>>>     at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
>>>     at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
>>>     at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
>>>     at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
>>>     at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
>>>     at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
>>>     at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108)
>>>     at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
>>>     at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply$mcZ$sp(SparkILoop.scala:991)
>>>     at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
>>>     at org.apache.spark.repl.SparkILoop$anonfun$org$apache$spark$repl$SparkILoop$process$1.apply(SparkILoop.scala:945)
>>>     at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>>     at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$process(SparkILoop.scala:945)
>>>     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>>     at org.apache.spark.repl.Main$.main(Main.scala:31)
>>>     at org.apache.spark.repl.Main.main(Main.scala)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>     at java.lang.reflect.Method.invoke(Method.java:606)
>>>     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731)
>>>     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>>>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>> In the end, I’m left at a scala prompt but (obviously) without sc or
>>> sqlContext
>>>
>>> <console>:16: error: not found: value sqlContext
>>>          import sqlContext.implicits._
>>>                 ^
>>> <console>:16: error: not found: value sqlContext
>>>          import sqlContext.sql
>>>                 ^
>>>
>>> scala>
>>>
>>> A bit of googling and reading on Stack Overflow suggests that this all
>>> boils down to the SecurityManager, and the difference between running on
>>> remote where the shell user matches the expected Hadoop user (so
>>> scala.SecurityManager sees Set(matt.tenenbaum)) vs running on my laptop
>>> where the SecurityManager sees Set(matt, matt.tenenbaum). I tried
>>> manually setting the SPARK_IDENT_STRING and USER environment variables to
>>> “matt.tenenbaum” also, but that doesn’t change the outcome.
>>>
>>> Am I even on the right track? Is this because of a mismatch between who
>>> I am on my laptop and who the cluster wants me to be? Is there any way to
>>> convince my local spark-shell invocation that I’m “matt.tenenbaum”, not
>>> “matt”?
>>>
>>> Thank you for reading this far, and for any suggestions
>>> -mt
>>> ​
>>>
>>
>> ​

Mime
View raw message