spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: Submitting Spark Applications using Spark Submit
Date Mon, 22 Jun 2015 21:38:03 GMT
Did you restart your master / workers? On the master node, run
`sbin/stop-all.sh` followed by `sbin/start-all.sh`

2015-06-20 17:59 GMT-07:00 Raghav Shankar <raghav0110.cs@gmail.com>:

> Hey Andrew,
>
>  I tried the following approach: I modified my Spark build on my local
> machine. I did downloaded the Spark 1.4.0 src code and then made a change
> to ResultTask.scala( I made a simple change to see if it work. I added a
> print statement). Now, I built spark using
>
> mvn -Dhadoop.version=1.0.4 -Phadoop-1 -DskipTests -Dscala-2.10 clean
> package
>
> Now, the new assembly jar was built. I started my EC2 Cluster using this
> command:
>
> ./ec2/spark-ec2 -k key -i ../aggr/key.pem --instance-type=m3.medium
> --zone=us-east-1b -s 9 launch spark-cluster
>
> I initially launched my application jar and it worked fine. After that I
> scp’d the new assembly jar to the spark lib directory of all my ec2 nodes.
> When I ran the jar again I got the following error:
>
> 5/06/21 00:42:51 INFO AppClient$ClientActor: Connecting to master
> akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077/user/Master...
> 15/06/21 00:42:52 WARN Remoting: Tried to associate with unreachable
> remote address [akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077].
> Address is now gated for 5000 ms, all messages to this address will be
> delivered to dead letters. Reason: Connection refused:
> ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
> 15/06/21 00:42:52 WARN AppClient$ClientActor: Could not connect to
> akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077: akka.remote.InvalidAssociation:
> Invalid address:
> akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077
> 15/06/21 00:43:11 INFO AppClient$ClientActor: Connecting to master
> akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077/user/Master...
> 15/06/21 00:43:11 WARN AppClient$ClientActor: Could not connect to
> akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077: akka.remote.InvalidAssociation:
> Invalid address:
> akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077
> 15/06/21 00:43:11 WARN Remoting: Tried to associate with unreachable
> remote address [akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077].
> Address is now gated for 5000 ms, all messages to this address will be
> delivered to dead letters. Reason: Connection refused:
> ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
> 15/06/21 00:43:31 INFO AppClient$ClientActor: Connecting to master
> akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077/user/Master...
> 15/06/21 00:43:31 WARN AppClient$ClientActor: Could not connect to
> akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077: akka.remote.InvalidAssociation:
> Invalid address: akka.tcp://sparkMaster@XXX.compute-1.amazonaws.com:7077
> 15/06/21 00:43:31 WARN Remoting: Tried to associate with unreachable
> remote address [akka.tcp://sparkMaster@ec2-XXX.compute-1.amazonaws.com:7077].
> Address is now gated for 5000 ms, all messages to this address will be
> delivered to dead letters. Reason: Connection refused:
> XXX.compute-1.amazonaws.com/10.165.103.16:7077
> 15/06/21 00:43:51 ERROR SparkDeploySchedulerBackend: Application has been
> killed. Reason: All masters are unresponsive! Giving up.
> 15/06/21 00:43:51 WARN SparkDeploySchedulerBackend: Application ID is not
> initialized yet.
> 15/06/21 00:43:51 INFO SparkUI: Stopped Spark web UI at
> http://XXX.compute-1.amazonaws.com:4040
> 15/06/21 00:43:51 INFO DAGScheduler: Stopping DAGScheduler
> 15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Shutting down all
> executors
> 15/06/21 00:43:51 INFO SparkDeploySchedulerBackend: Asking each executor
> to shut down
> 15/06/21 00:43:51 ERROR OneForOneStrategy:
> java.lang.NullPointerException
> at
> org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> at
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
> at
> org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> at
> org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at
> org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
> at akka.dispatch.Mailbox.run(Mailbox.scala:220)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> Also, in the above error it says:* connection refused to
> ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077
> <http://ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077> *I don’t
> understand where it gets the *10.165.103.16
> <http://ec2-XXX.compute-1.amazonaws.com/10.165.103.16:7077> *from. I
> never specify that in the master url command line parameter. Any ideas on
> what I might be doing wrong?
>
>
> On Jun 19, 2015, at 7:19 PM, Andrew Or <andrew@databricks.com> wrote:
>
> Hi Raghav,
>
> I'm assuming you're using standalone mode. When using the Spark EC2
> scripts you need to make sure that every machine has the most updated jars.
> Once you have built on one of the nodes, you must rsync the Spark directory
> to the rest of the nodes (see /root/spark-ec2/copy-dir).
>
> That said, I usually build it locally on my laptop and scp the assembly
> jar to the cluster instead of building it there. The EC2 machines often
> take much longer to build for some reason. Also it's cumbersome to set up
> proper IDE there.
>
> -Andrew
>
>
> 2015-06-19 19:11 GMT-07:00 Raghav Shankar <raghav0110.cs@gmail.com>:
> Thanks Andrew! Is this all I have to do when using the spark ec2 script to
> setup a spark cluster? It seems to be getting an assembly jar that is not
> from my project(perhaps from a maven repo). Is there a way to make the ec2
> script use the assembly jar that I created?
>
> Thanks,
> Raghav
>
>
> On Friday, June 19, 2015, Andrew Or <andrew@databricks.com> wrote:
> Hi Raghav,
>
> If you want to make changes to Spark and run your application with it, you
> may follow these steps.
>
> 1. git clone git@github.com:apache/spark
> 2. cd spark; build/mvn clean package -DskipTests [...]
> 3. make local changes
> 4. build/mvn package -DskipTests [...] (no need to clean again here)
> 5. bin/spark-submit --master spark://[...] --class your.main.class your.jar
>
> No need to pass in extra --driver-java-options or --driver-extra-classpath
> as others have suggested. When using spark-submit, the main jar comes from
> assembly/target/scala_2.10, which is prepared through "mvn package". You
> just have to make sure that you re-package the assembly jar after each
> modification.
>
> -Andrew
>
> 2015-06-18 16:35 GMT-07:00 maxdml <maxdml@cs.duke.edu>:
> You can specify the jars of your application to be included with
> spark-submit
> with the /--jars/ switch.
>
> Otherwise, are you sure that your newly compiled spark jar assembly is in
> assembly/target/scala-2.10/?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-using-Spark-Submit-tp23352p23400.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>
>
>

Mime
View raw message