spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grega KeŇ°pret <gr...@celtra.com>
Subject Re: How do you run Spark jobs?
Date Tue, 13 Aug 2013 07:55:26 GMT
Hey Evan,
any chance you might find the link to the above mentioned SBT recipe?
Would greatly appreciate it.

Thanks,
Grega

On Fri, Aug 9, 2013 at 10:00 AM, Evan Chan <ev@ooyala.com> wrote:

> Hey Patrick,
>
> A while back I posted an SBT recipe allowing users to build Scala job
> assemblies that excluded Spark and its deps, which is what most people want
> I believe.  This allows you to include your own libraries and exclude
> Spark's for the smallest possible one.
>
> We don't use Spark's run script, instead we have SBT configured so that you
> can simply type "run" to run jobs.   I believe this gives maximum developer
> velocity.   We also have "sbt console" hooked up so that you can run spark
> shell from it (no need for ./spark-shell script).
>
> And, as you know, we are going to contribute back a job server.   We
> believe that for most organizations this will provide the easiest way for
> submitting and managing jobs -- IT/OPS sets up Spark as HTTP service (using
> job server), and users/developers can submit jobs to a managed service.
> We even have a giter8 template to make creating jobs for job server super
> simple.  The template has support for local run, spark shell, assembly, and
> testing.
>
> So anyways, I believe we'll have a lot to contribute to your guide -- both
> now and especially once the job server is contributed....  feel free to
> touch base offline.
>
> -Evan
>
>
>
>
>
> On Fri, Aug 2, 2013 at 9:50 PM, Patrick Wendell <pwendell@gmail.com>
> wrote:
>
> > Hey All,
> >
> > I'm working on SPARK-800 [1]. The goal is to document a best practice or
> > recommended way of bundling and running Spark jobs. We have a quickstart
> > guide for writing a standlone job, but it doesn't cover how to deal with
> > packaging up your dependencies and setting the correct environment
> > variables required to submit a full job to a cluster. This can be a
> > confusing process for beginners - it would be good to extend the guide to
> > cover this.
> >
> > First though I wanted to sample this list and see how people tend to run
> > Spark jobs inside their org's. Knowing any of the following would be
> > helpful:
> >
> > - Do you create an uber jar with all of your job (and Spark)'s recursive
> > dependencies?
> > - Do you try to use sbt run or maven exec with some way to pass the
> correct
> > environment variables?
> > - Do people use a modified version of spark's own `run` script?
> > - Do you have some other way of submitting jobs?
> >
> > Any notes would be helpful in compiling this!
> >
> > https://spark-project.atlassian.net/browse/SPARK-800
> >
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> ev@ooyala.com  |
>
> <http://www.ooyala.com/>
> <http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><
> http://www.twitter.com/ooyala>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message