spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: Spark and continuous integration
Date Tue, 14 Mar 2017 11:44:56 GMT

On 13 Mar 2017, at 13:24, Sam Elamin <<>>

Hi Jorn

Thanks for the prompt reply, really we have 2 main concerns with CD, ensuring tests pasts
and linting on the code.

I'd add "providing diagnostics when tests fail", which is a combination of: tests providing
useful information and CI tooling collecting all those results and presenting them meaningfully.
The hard parts are invariably (at least for me)

-what to do about the intermittent failures
-tradeoff between thorough testing and fast testing, especially when thorough means "better/larger

You can consider the output of jenkins & tests as data sources for your own analysis too:
track failure rates over time, test runs over time, etc: could be interesting. If you want
to go there, then the question of "which CI toolings produce the most interesting machine-parseable
results, above and beyond the classic Ant-originated XML test run reports"

I have mixed feelings about scalatest there: I think the expression language is good, but
the maven test runner doesn't report that well, at least for me:

I think all platforms should handle this with ease, I was just wondering what people are using.

Jenkins seems to have the best spark plugins so we are investigating that as well as a variety
of other hosted CI tools

Happy to write a blog post detailing our findings and sharing it here if people are interested


On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke <<>>

Jenkins also now supports pipeline as code and multibranch pipelines. thus you are not so
dependent on the UI and you do not need anymore a long list of jobs for different branches.
Additionally it has a new UI (beta) called blueocean, which is a little bit nicer. You may
also check GoCD. Aside from this you have a huge variety of commercial tools, e.g. Bamboo.
In the cloud, I use for my open source github projects Travis-Ci, but there are also a lot
of alternatives, e.g. Distelli.

It really depends what you expect, e.g. If you want to Version the build pipeline in GIT,
if you need Docker deployment etc. I am not sure if new starters should be responsible for
the build pipeline, thus I am not sure that i understand  your concern in this area.

From my experience, integration tests for Spark can be run on any of these platforms.

Best regards

> On 13 Mar 2017, at 10:55, Sam Elamin <<>>
> Hi Folks
> This is more of a general question. What's everyone using for their CI /CD when it comes
to spark
> We are using Pyspark but potentially looking to make to spark scala and Sbt in the future
> One of the suggestions was jenkins but I know the UI isn't great for new starters so
I'd rather avoid it. I've used team city but that was more focused on dot net development
> What are people using?
> Kind Regards
> Sam

View raw message