spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@apache.org>
Subject Re: Spark streaming on YARN?
Date Fri, 10 Jan 2014 13:16:10 GMT
OK I got a bit farther but still no dice. I hacked up the
HdfsWordCount.scala file to add a bunch of print statements output to a
file basically. Apologies for the messy code, like I said I'm new to Scala.
The file is here:

https://gist.github.com/mpercy/8351573

Apparently it's executing and getting somewhat far into the script, so I
believe I'm actually passing the args correctly. This is the output I am
getting from the above in /tmp/thing.txt (great name, I know):

hello world
still here: yarn-standalone, hdfs:///user/systest/hdfswordcount-test2
still here again ok: org.apache.spark.streaming.StreamingContext@7a237188
created stream: org.apache.spark.streaming.dstream.MappedDStream@6f4b06be
printing word counts:
org.apache.spark.streaming.dstream.ShuffledDStream@5442e1da

So it looks like it hangs or gets killed when executing wordCounts.print()
... but that's all the info I've been able to glean so far.

I'm not sure I am catching the Throwable properly, if there is one (trying
to get a stack trace). I wonder if it's getting kill -9ed by somebody...
but I'd only expect that from YARN if there was an OutOfMemoryError and if
that happened I think I should catch the OOME via my try block and be able
to print, unless it threw again inside my catch maybe.

Any suggestions on where to go from here?

Thanks,
Mike


On Fri, Jan 10, 2014 at 3:04 AM, Mike Percy <mpercy@apache.org> wrote:

> Thanks Tathagata. That helped a lot, but I am having some trouble under
> YARN with the HdfsWordCount example.
>
> I was able to get the example to work locally, and was also able to submit
> the job to the YARN cluster, but it looks like it is crashing under YARN.
> The streaming job stops after about 30 seconds, right after it runs, and
> before I'm able to put anything new into the input directory. This is the
> command I am running on the command line:
>
> export HADOOP_CONF_DIR=/etc/hadoop/conf
> SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar
> ./spark-class org.apache.spark.deploy.yarn.Client --jar
> examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
> --class org.apache.spark.streaming.examples.HdfsWordCount --args
> yarn-standalone --args hdfs:///user/mpercy/hdfswordcount-test2
> --num-workers 3 --master-memory 4g --worker-memory 2g --worker-cores 1
>
> This is the kind of output I am getting in the YARN NodeManager log file:
>
> 2014-01-09 20:13:29,249 INFO
> org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to
> driver: akka://spark@sparktest-01:58117/user/CoarseGrainedScheduler
> 2014-01-09 20:13:29,358 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending
> out status for container: container_id { app_attempt_id { application_id {
> id: 8 cluster_timestamp: 1389304540039 } attemptId: 1 } id: 4 } state:
> C_RUNNING diagnostics: "" exit_status: -1000
> 2014-01-09 20:13:29,476 ERROR
> org.apache.spark.executor.CoarseGrainedExecutorBackend: Driver terminated
> or disconnected! Shutting down.
> 2014-01-09 20:13:29,825 WARN
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit
> code from container container_1389304540039_0008_01_000004 is : 1
> 2014-01-09 20:13:29,825 WARN
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Exception from container-launch with container ID:
> container_1389304540039_0008_01_000004 and exit code: 1
> org.apache.hadoop.util.Shell$ExitCodeException:
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
>         at org.apache.hadoop.util.Shell.run(Shell.java:379)
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
>         at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
> 2014-01-09 20:13:29,825 INFO
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
> 2014-01-09 20:13:29,825 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
> Container exited with a non-zero exit code 1
> 2014-01-09 20:13:29,826 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1389304540039_0008_01_000004 transitioned from RUNNING
> to EXITED_WITH_FAILURE
>
> While it was difficult to get the logs from YARN before it deleted them
> during job cleanup, I finally did and all I got was this from stderr
> (stdout file was empty):
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/media/ephemeral0/yarn/nm/usercache/mpercy/filecache/53/spark-assembly-0.8.1-incubating-hadoop2.2.0-cdh5.0.0-beta-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/media/ephemeral0/yarn/nm/usercache/mpercy/filecache/52/spark-examples-assembly-0.8.1-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>
> Not super useful AFAICT, since I'm pretty sure SLF4J will pick the first
> binding so I doubt that was the cause of the crash. Any suggestions on how
> to proceed?
>
> One guess is that I am passing the args wrong. I'm new to Scala so I'm not
> sure whether I'm reading the ClientArguments code right, but based on the
> comments in one of the files I think passing --args multiple times is the
> right way to do it.
>
> And just for good measure, this is what is being executed by YARN's
> launch_container.sh script:
>
> exec /bin/bash -c "$JAVA_HOME/bin/java -server -Xmx4096m
> -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.ApplicationMaster
> --class org.apache.spark.streaming.examples.HdfsWordCount --jar
> examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar
> --args  'yarn-standalone'  --args
>  'hdfs:///user/mpercy/hdfswordcount-test2'  --worker-memory 2048
> --worker-cores 1 --num-workers 3 1>
> /var/log/hadoop-yarn/container/application_1389304540039_0052/container_1389304540039_0052_01_000001/stdout
> 2>
> /var/log/hadoop-yarn/container/application_1389304540039_0052/container_1389304540039_0052_01_000001/stderr
>
> Would love to hear any suggestions for how to debug this further!
>
> Thanks,
> Mike
>
>
>
> On Thu, Jan 9, 2014 at 5:44 PM, Tathagata Das <tathagata.das1565@gmail.com
> > wrote:
>
>> If you have been able to run Spark Pi to run on YARN, then you should be
>> able to run the streaming example HdfsWordCount<https://github.com/apache/incubator-spark/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/HdfsWordCount.scala>
as
>> well. Even though the instructions in the example says to run it on local
>> machine, you can run the example on YARN as well in the same way as Spark
>> PI. You would just have to give the appropriate Spark master url and use an
>> HDFS directory as the 2nd parameter. Then any text file written to that
>> HDFS directory will get "word counted".
>>
>> Note that you should write a file to that HDFS directory by moving the
>> file from some other directory to that directory. For example if the HDFS
>> directory that you want to use to run the example is
>> *hdfs://myhdfs:9000/mydir/* , then you can first copy a local file (say
>> new_file) to "*hdfs://myhdfs:9000/temp_location/new_file *" then do a
>> move it to "*hdfs://myhdfs:9000/mydir/new_file*".
>>
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:29 PM, Mike Percy <mpercy@apache.org> wrote:
>>
>>> After looking through the docs, grepping the commit logs and looking on
>>> the list archives, I have been unable to see an indication or example of
>>> Spark streaming working on YARN. Is this possible yet? So far, I've gotten
>>> at least the Spark Pi example to run on YARN with CDH5 beta 1.
>>>
>>> I am about to dig into the code and try to figure out how the batch Yarn
>>> client works, to see how much work it would be to set up an AM to run an
>>> InputDStream, but thought I'd make it easy on myself ask here first before
>>> I got started.
>>>
>>> Thanks in advance for any pointers,
>>> Mike
>>>
>>>
>>
>

Mime
View raw message