spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Graves <tgraves...@yahoo.com>
Subject Re: Spark (trunk/yarn) on CDH4.3.0.2 - YARN
Date Tue, 10 Sep 2013 01:12:43 GMT
You need to look at all of the logs.  Try running the yarn logs command.  That should give
you all of the logs for your application, assuming it is configured for log aggregation. 
If that doesn't work try going to the resource manager web ui and click on your application
and view the logs for the application master at least.

Tom





On Sep 9, 2013, at 7:57 PM, Vipul Pandey <vipandey@gmail.com> wrote:

> oops, my apologies, I skipped that section in my excitement after seeing the example
running. I can see my job being submitted now and executed as I saw some print statements
in the container running my app.
> 
> BUT my app fails for some reasons and i'm not sure why. this is what i see in the err
logs of my app container. 
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> spark.SparkException: Job failed: Task 1 failed more than 4 times
> 	at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:544)
> 	at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:542)
> 
> 
> Any tips on how to track down the problem? 
> 
> 
>  
> 
> 
> On Mon, Sep 9, 2013 at 10:46 AM, Tom Graves <tgraves_cs@yahoo.com> wrote:
>> You use yarn-standalone as the MASTER url so replace spark://a.b.c:7077 with yarn-standalone.
>> 
>> The important notes section of the yarn doc mentions it: https://github.com/mesos/spark/blob/master/docs/running-on-yarn.md
>> 
>> Tom
>> From: Vipul Pandey <vipandey@gmail.com>
>> To: user@spark.incubator.apache.org 
>> Sent: Monday, September 9, 2013 12:32 PM
>> Subject: Re: Spark (trunk/yarn) on CDH4.3.0.2 - YARN
>> 
>> Thanks for the tip - I'm building off of the master and against CDH4.3.0 now (my
cluster is CDH4.3.0.2) - apache hadoop version is hadoop2.0.0
>> http://www.cloudera.com/content/cloudera-content/cloudera-docs/PkgVer/3.25.2013/CDH-Version-and-Packaging-Information/cdhvd_topic_3_1.html
>> 
>> After following the instructions on the doc below, here's what I found : 
>> 
>> - SPARK_HADOOP_VERSION=2.0.0-cdh4.3.0 SPARK_YARN=true ./sbt/sbt assembly
>> This results in module not found : org.apache.hadoop#hadoop-client;2.0.0-mr2-cdh4.3.0.2
>> with below as one of the warning messages
>> [warn] ==== Cloudera Repository: tried
>> [warn]   http://repository.cloudera.com/artifactory/cloudera-repos/org/apache/hadoop/hadoop-client/2.0.0-mr2-cdh4.3.0.2/hadoop-client-2.0.0-mr2-cdh4.3.0.2.pom
>> 
>> I realized that they have made their repository secure now so http won't work. Changing
it to https in SparkBuild.scala helps. Someone may want to make that change and check in.
>> 
>> Also, Executing the assembly command above does not generate the example jars as
mentioned in the directions. I had to run sbt package to get that jar and rerun the assembly.

>> 
>> I was able to run the example just fine. 
>> 
>> 
>> Now, the next question. How should I initialize my SparkContext for Yarn  : 
>> This is what I had with the standalone mode - 
>>     val sc = new SparkContext("spark://a.b.c:7077", "indexXformation", "", Seq());
>> Do i do something here? or will the client pick up the yarn configurations from the
hadoop config?
>> 
>> Vipul
>> 
>> 
>> 
>> On Fri, Sep 6, 2013 at 4:30 PM, Tom Graves <tgraves_cs@yahoo.com> wrote:
>> Which spark branch are you building off of?  
>> If using master branch follow the directions here: https://github.com/mesos/spark/blob/master/docs/running-on-yarn.md
>> 
>> Make sure to set your Hadoop version to CDh.
>> 
>> I'm not sure what the CDh versions map to in regular apache Hadoop but if its newer
then the apache hadoop 2.0.5-alpha then they changed
>>  yarn Apis so it won't work without changes to the app master.
>> 
>> Tom
>> 
>> On Sep 6, 2013, at 5:37 PM, Vipul Pandey <vipandey@gmail.com> wrote:
>> 
>>> I'm unable to successfully run the SparkPi example in my YARN cluster. 
>>> I did whatever has been specified here (didn't change anything anywhere) : http://spark.incubator.apache.org/docs/0.7.0/running-on-yarn.html
>>> and added HADOOP_CONF_DIR as well. (btw, on sbt/sbt assembly - the jar file it
generates is spark-core-assembly-0.6.0.jar)
>>> 
>>> I get the following exception in my container : 
>>> 
>>> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>>> 	at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>>> 	at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:103)
>>> 	at spark.deploy.yarn.ApplicationMaster.registerApplicationMaster(ApplicationMaster.scala:123)
>>> 	at spark.deploy.yarn.ApplicationMaster.spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:52)
>>> 	at spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:42)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
>>> 	at spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:40)
>>> 	at spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:340)
>>> 	at spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>>> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed
on local exception: com.google.protobuf.InvalidProtocolBufferException: Message missing required
fields: callId, status; Host Details : local host is: "rd17d01ls-vm0109.rd.geo.apple.com/17.134.172.65";
destination host is: "rd17d01ls-vm0110.rd.geo.apple.com":8030; 
>>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212)
>>> 	at $Proxy7.registerApplicationMaster(Unknown Source)
>>> 	at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:100)
>>> 	... 9 more
>>> Caused by: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:
Message missing required fields: callId, status; Host Details : local host is: "rd17d01ls-vm0109.rd.geo.apple.com/17.134.172.65";
destination host is: "rd17d01ls-vm0110.rd.geo.apple.com":8030; 
>>> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
>>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1239)
>>> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>> 	... 11 more
>>> Caused by: com.google.protobuf.InvalidProtocolBufferException: Message missing
required fields: callId, status
>>> 	at com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:81)
>>> 	at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto$Builder.buildParsed(RpcPayloadHeaderProtos.java:1094)
>>> 	at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto$Builder.access$1300(RpcPayloadHeaderProtos.java:1028)
>>> 	at org.apache.hadoop.ipc.protobuf.RpcPayloadHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcPayloadHeaderProtos.java:986)
>>> 	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:946)
>>> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
>>> 
>>> 
>>> 
>>> 
>>> Any solutions anyone? 
> 

Mime
View raw message