spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Chan ...@ooyala.com>
Subject Re: How do you run Spark jobs?
Date Thu, 15 Aug 2013 17:21:20 GMT
Here it is:

https://groups.google.com/forum/?fromgroups=#!searchin/spark-users/SBT/spark-users/pHaF01sPwBo/faHr-fEAFbYJ


On Tue, Aug 13, 2013 at 12:55 AM, Grega KeŇ°pret <grega@celtra.com> wrote:

> Hey Evan,
> any chance you might find the link to the above mentioned SBT recipe?
> Would greatly appreciate it.
>
> Thanks,
> Grega
>
> On Fri, Aug 9, 2013 at 10:00 AM, Evan Chan <ev@ooyala.com> wrote:
>
> > Hey Patrick,
> >
> > A while back I posted an SBT recipe allowing users to build Scala job
> > assemblies that excluded Spark and its deps, which is what most people
> want
> > I believe.  This allows you to include your own libraries and exclude
> > Spark's for the smallest possible one.
> >
> > We don't use Spark's run script, instead we have SBT configured so that
> you
> > can simply type "run" to run jobs.   I believe this gives maximum
> developer
> > velocity.   We also have "sbt console" hooked up so that you can run
> spark
> > shell from it (no need for ./spark-shell script).
> >
> > And, as you know, we are going to contribute back a job server.   We
> > believe that for most organizations this will provide the easiest way for
> > submitting and managing jobs -- IT/OPS sets up Spark as HTTP service
> (using
> > job server), and users/developers can submit jobs to a managed service.
> > We even have a giter8 template to make creating jobs for job server super
> > simple.  The template has support for local run, spark shell, assembly,
> and
> > testing.
> >
> > So anyways, I believe we'll have a lot to contribute to your guide --
> both
> > now and especially once the job server is contributed....  feel free to
> > touch base offline.
> >
> > -Evan
> >
> >
> >
> >
> >
> > On Fri, Aug 2, 2013 at 9:50 PM, Patrick Wendell <pwendell@gmail.com>
> > wrote:
> >
> > > Hey All,
> > >
> > > I'm working on SPARK-800 [1]. The goal is to document a best practice
> or
> > > recommended way of bundling and running Spark jobs. We have a
> quickstart
> > > guide for writing a standlone job, but it doesn't cover how to deal
> with
> > > packaging up your dependencies and setting the correct environment
> > > variables required to submit a full job to a cluster. This can be a
> > > confusing process for beginners - it would be good to extend the guide
> to
> > > cover this.
> > >
> > > First though I wanted to sample this list and see how people tend to
> run
> > > Spark jobs inside their org's. Knowing any of the following would be
> > > helpful:
> > >
> > > - Do you create an uber jar with all of your job (and Spark)'s
> recursive
> > > dependencies?
> > > - Do you try to use sbt run or maven exec with some way to pass the
> > correct
> > > environment variables?
> > > - Do people use a modified version of spark's own `run` script?
> > > - Do you have some other way of submitting jobs?
> > >
> > > Any notes would be helpful in compiling this!
> > >
> > > https://spark-project.atlassian.net/browse/SPARK-800
> > >
> >
> >
> >
> > --
> > --
> > Evan Chan
> > Staff Engineer
> > ev@ooyala.com  |
> >
> > <http://www.ooyala.com/>
> > <http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala
> ><
> > http://www.twitter.com/ooyala>
> >
>



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

<http://www.ooyala.com/>
<http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message