spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject How do you run Spark jobs?
Date Sat, 03 Aug 2013 04:50:21 GMT
Hey All,

I'm working on SPARK-800 [1]. The goal is to document a best practice or
recommended way of bundling and running Spark jobs. We have a quickstart
guide for writing a standlone job, but it doesn't cover how to deal with
packaging up your dependencies and setting the correct environment
variables required to submit a full job to a cluster. This can be a
confusing process for beginners - it would be good to extend the guide to
cover this.

First though I wanted to sample this list and see how people tend to run
Spark jobs inside their org's. Knowing any of the following would be
helpful:

- Do you create an uber jar with all of your job (and Spark)'s recursive
dependencies?
- Do you try to use sbt run or maven exec with some way to pass the correct
environment variables?
- Do people use a modified version of spark's own `run` script?
- Do you have some other way of submitting jobs?

Any notes would be helpful in compiling this!

https://spark-project.atlassian.net/browse/SPARK-800

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message