spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: Spark streaming on YARN?
Date Fri, 10 Jan 2014 11:04:55 GMT
Thanks Tathagata. That helped a lot, but I am having some trouble under
YARN with the HdfsWordCount example.

I was able to get the example to work locally, and was also able to submit
the job to the YARN cluster, but it looks like it is crashing under YARN.
The streaming job stops after about 30 seconds, right after it runs, and
before I'm able to put anything new into the input directory. This is the
command I am running on the command line:

export HADOOP_CONF_DIR=/etc/hadoop/conf
SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar
./spark-class org.apache.spark.deploy.yarn.Client --jar
examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
--class org.apache.spark.streaming.examples.HdfsWordCount --args
yarn-standalone --args hdfs:///user/mpercy/hdfswordcount-test2
--num-workers 3 --master-memory 4g --worker-memory 2g --worker-cores 1

This is the kind of output I am getting in the YARN NodeManager log file:

2014-01-09 20:13:29,249 INFO
org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to
driver: akka://spark@sparktest-01:58117/user/CoarseGrainedScheduler
2014-01-09 20:13:29,358 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
out status for container: container_id { app_attempt_id { application_id {
id: 8 cluster_timestamp: 1389304540039 } attemptId: 1 } id: 4 } state:
C_RUNNING diagnostics: "" exit_status: -1000
2014-01-09 20:13:29,476 ERROR
org.apache.spark.executor.CoarseGrainedExecutorBackend: Driver terminated
or disconnected! Shutting down.
2014-01-09 20:13:29,825 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
code from container container_1389304540039_0008_01_000004 is : 1
2014-01-09 20:13:29,825 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exception from container-launch with container ID:
container_1389304540039_0008_01_000004 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
        at org.apache.hadoop.util.Shell.run(Shell.java:379)
        at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
        at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
2014-01-09 20:13:29,825 INFO
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2014-01-09 20:13:29,825 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Container exited with a non-zero exit code 1
2014-01-09 20:13:29,826 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1389304540039_0008_01_000004 transitioned from RUNNING
to EXITED_WITH_FAILURE

While it was difficult to get the logs from YARN before it deleted them
during job cleanup, I finally did and all I got was this from stderr
(stdout file was empty):

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/media/ephemeral0/yarn/nm/usercache/mpercy/filecache/53/spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/media/ephemeral0/yarn/nm/usercache/mpercy/filecache/52/spark-examples-assembly-0.8.1-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Not super useful AFAICT, since I'm pretty sure SLF4J will pick the first
binding so I doubt that was the cause of the crash. Any suggestions on how
to proceed?

One guess is that I am passing the args wrong. I'm new to Scala so I'm not
sure whether I'm reading the ClientArguments code right, but based on the
comments in one of the files I think passing --args multiple times is the
right way to do it.

And just for good measure, this is what is being executed by YARN's
launch_container.sh script:

exec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx4096m
-Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster
--class org.apache.spark.streaming.examples.HdfsWordCount --jar
examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
--args  'yarn-standalone'  --args
 'hdfs:///user/mpercy/hdfswordcount-test2'  --worker-memory 2048
--worker-cores 1 --num-workers 3 1>
/var/log/hadoop-yarn/container/application_1389304540039_0052/container_1389304540039_0052_01_000001/stdout
2>
/var/log/hadoop-yarn/container/application_1389304540039_0052/container_1389304540039_0052_01_000001/stderr

Would love to hear any suggestions for how to debug this further!

Thanks,
Mike



On Thu, Jan 9, 2014 at 5:44 PM, Tathagata Das
<tathagata.das1565@gmail.com>wrote:

> If you have been able to run Spark Pi to run on YARN, then you should be
> able to run the streaming example HdfsWordCount<https://github.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/HdfsWordCount.scala>
as
> well. Even though the instructions in the example says to run it on local
> machine, you can run the example on YARN as well in the same way as Spark
> PI. You would just have to give the appropriate Spark master url and use an
> HDFS directory as the 2nd parameter. Then any text file written to that
> HDFS directory will get "word counted".
>
> Note that you should write a file to that HDFS directory by moving the
> file from some other directory to that directory. For example if the HDFS
> directory that you want to use to run the example is
> *hdfs://myhdfs:9000/mydir/* , then you can first copy a local file (say
> new_file) to "*hdfs://myhdfs:9000/temp_location/new_file *" then do a
> move it to "*hdfs://myhdfs:9000/mydir/new_file*".
>
>
>
>
> On Thu, Jan 9, 2014 at 5:29 PM, Mike Percy <mpercy@apache.org> wrote:
>
>> After looking through the docs, grepping the commit logs and looking on
>> the list archives, I have been unable to see an indication or example of
>> Spark streaming working on YARN. Is this possible yet? So far, I've gotten
>> at least the Spark Pi example to run on YARN with CDH5 beta 1.
>>
>> I am about to dig into the code and try to figure out how the batch Yarn
>> client works, to see how much work it would be to set up an AM to run an
>> InputDStream, but thought I'd make it easy on myself ask here first before
>> I got started.
>>
>> Thanks in advance for any pointers,
>> Mike
>>
>>
>

Mime
View raw message