spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrikar archak <shrika...@gmail.com>
Subject Re: How do you run your spark app?
Date Fri, 20 Jun 2014 18:03:22 GMT
Hi Shivani,

I use sbt assembly to create a fat jar .
https://github.com/sbt/sbt-assembly

Example of the sbt file is below.

import AssemblyKeys._ // put this at the top of the file

assemblySettings

mainClass in assembly := Some("FifaSparkStreaming")

name := "FifaSparkStreaming"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.0.0" %
"provided",
                            "org.apache.spark" %% "spark-streaming" %
"1.0.0" % "provided",
                            ("org.apache.spark" %%
"spark-streaming-twitter" %
"1.0.0").exclude("org.eclipse.jetty.orbit","javax.transaction")

           .exclude("org.eclipse.jetty.orbit","javax.servlet")

           .exclude("org.eclipse.jetty.orbit","javax.mail.glassfish")

           .exclude("org.eclipse.jetty.orbit","javax.activation")

           .exclude("com.esotericsoftware.minlog", "minlog"),
                            ("net.debasishg" % "redisclient_2.10" %
"2.12").exclude("com.typesafe.akka","akka-actor_2.10"))

mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
  {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first
    case PathList("org", "apache", xs @ _*) => MergeStrategy.first
    case PathList("org", "apache", xs @ _*) => MergeStrategy.first
    case "application.conf" => MergeStrategy.concat
    case "unwanted.txt"     => MergeStrategy.discard
    case x => old(x)
  }
}


resolvers += "Akka Repository" at "http://repo.akka.io/releases/"


And I run as mentioned below.

LOCALLY :
1)  sbt 'run AP1z4IYraYm5fqWhITWArY53x
Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014'

If you want to submit on the cluster

CLUSTER:
2) spark-submit --class FifaSparkStreaming --master
"spark://server-8-144:7077" --driver-memory 2048 --deploy-mode cluster
FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x
Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014


Hope this helps.

Thanks,
Shrikar


On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao <raoshivani@gmail.com> wrote:

> Hello Michael,
>
> I have a quick question for you. Can you clarify the statement " build
> fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's
> and everything needed to run a Job".  Can you give an example.
>
> I am using sbt assembly as well to create a fat jar, and supplying the
> spark and hadoop locations in the class path. Inside the main() function
> where spark context is created, I use SparkContext.jarOfClass(this).toList
> add the fat jar to my spark context. However, I seem to be running into
> issues with this approach. I was wondering if you had any inputs Michael.
>
> Thanks,
> Shivani
>
>
> On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal <sonalgoyal4@gmail.com>
> wrote:
>
>> We use maven for building our code and then invoke spark-submit through
>> the exec plugin, passing in our parameters. Works well for us.
>>
>> Best Regards,
>> Sonal
>> Nube Technologies <http://www.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <michael@tumra.com>
>> wrote:
>>
>>> P.S. Last but not least we use sbt-assembly to build fat JAR's and build
>>> dist-style TAR.GZ packages with launch scripts, JAR's and everything needed
>>> to run a Job.  These are automatically built from source by our Jenkins and
>>> stored in HDFS.  Our Chronos/Marathon jobs fetch the latest release TAR.GZ
>>> direct from HDFS, unpack it and launch the appropriate script.
>>>
>>> Makes for a much cleaner development / testing / deployment to package
>>> everything required in one go instead of relying on cluster specific
>>> classpath additions or any add-jars functionality.
>>>
>>>
>>> On 19 June 2014 22:53, Michael Cutler <michael@tumra.com> wrote:
>>>
>>>> When you start seriously using Spark in production there are basically
>>>> two things everyone eventually needs:
>>>>
>>>>    1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
>>>>    2. Always-On Jobs - that require monitoring, restarting etc.
>>>>
>>>> There are lots of ways to implement these requirements, everything from
>>>> crontab through to workflow managers like Oozie.
>>>>
>>>> We opted for the following stack:
>>>>
>>>>    - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution)
>>>>
>>>>
>>>>    - Marathon <https://github.com/mesosphere/marathon> - init/control
>>>>    system for starting, stopping, and maintaining always-on applications.
>>>>
>>>>
>>>>    - Chronos <http://airbnb.github.io/chronos/> - general-purpose
>>>>    scheduler for Mesos, supports job dependency graphs.
>>>>
>>>>
>>>>    - ** Spark Job Server <https://github.com/ooyala/spark-jobserver>
-
>>>>    primarily for it's ability to reuse shared contexts with multiple jobs
>>>>
>>>> The majority of our jobs are periodic (batch) jobs run through
>>>> spark-sumit, and we have several always-on Spark Streaming jobs (also run
>>>> through spark-submit).
>>>>
>>>> We always use "client mode" with spark-submit because the Mesos cluster
>>>> has direct connectivity to the Spark cluster and it means all the Spark
>>>> stdout/stderr is externalised into Mesos logs which helps diagnosing
>>>> problems.
>>>>
>>>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run
>>>> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you
>>>> can be up and running in literally minutes.  The Web UI's to both make it
>>>> easy to get started without talking to REST API's etc.
>>>>
>>>> Best,
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>> On 19 June 2014 19:44, Evan R. Sparks <evan.sparks@gmail.com> wrote:
>>>>
>>>>> I use SBT, create an assembly, and then add the assembly jars when I
>>>>> create my spark context. The main executor I run with something like
"java
>>>>> -cp ... MyDriver".
>>>>>
>>>>> That said - as of spark 1.0 the preferred way to run spark
>>>>> applications is via spark-submit -
>>>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>>>>
>>>>>
>>>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldmtwo@gmail.com> wrote:
>>>>>
>>>>>> I want to ask this, not because I can't read endless documentation
and
>>>>>> several tutorials, but because there seems to be many ways of doing
>>>>>> things
>>>>>> and I keep having issues. How do you run /your /spark app?
>>>>>>
>>>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then
>>>>>> I had
>>>>>> to get Spark and Shark working and ended upgrading everything and
>>>>>> dropped
>>>>>> CDH support. Anyways, this is what I used with master=yarn-client
and
>>>>>> app_jar being Scala code compiled with Maven.
>>>>>>
>>>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
>>>>>> $CLASSNAME
>>>>>> $ARGS
>>>>>>
>>>>>> Do you use this? or something else? I could never figure out this
>>>>>> method.
>>>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS
>>>>>>
>>>>>> For example:
>>>>>> bin/spark-class jar
>>>>>>
>>>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
>>>>>> pi 10 10
>>>>>>
>>>>>> Do you use SBT or Maven to compile? or something else?
>>>>>>
>>>>>>
>>>>>> ** It seams that I can't get subscribed to the mailing list and I
>>>>>> tried both
>>>>>> my work email and personal.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>

Mime
View raw message