On 13 Mar 2017, at 13:24, Sam Elamin <hussam.elamin@gmail.com> wrote:

Hi Jorn

Thanks for the prompt reply, really we have 2 main concerns with CD, ensuring tests pasts and linting on the code. 

I'd add "providing diagnostics when tests fail", which is a combination of: tests providing useful information and CI tooling collecting all those results and presenting them meaningfully. The hard parts are invariably (at least for me)

-what to do about the intermittent failures
-tradeoff between thorough testing and fast testing, especially when thorough means "better/larger datasets"

You can consider the output of jenkins & tests as data sources for your own analysis too: track failure rates over time, test runs over time, etc: could be interesting. If you want to go there, then the question of "which CI toolings produce the most interesting machine-parseable results, above and beyond the classic Ant-originated XML test run reports"

I have mixed feelings about scalatest there: I think the expression language is good, but the maven test runner doesn't report that well, at least for me:


I think all platforms should handle this with ease, I was just wondering what people are using.

Jenkins seems to have the best spark plugins so we are investigating that as well as a variety of other hosted CI tools

Happy to write a blog post detailing our findings and sharing it here if people are interested 


On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke <jornfranke@gmail.com> wrote:

Jenkins also now supports pipeline as code and multibranch pipelines. thus you are not so dependent on the UI and you do not need anymore a long list of jobs for different branches. Additionally it has a new UI (beta) called blueocean, which is a little bit nicer. You may also check GoCD. Aside from this you have a huge variety of commercial tools, e.g. Bamboo.
In the cloud, I use for my open source github projects Travis-Ci, but there are also a lot of alternatives, e.g. Distelli.

It really depends what you expect, e.g. If you want to Version the build pipeline in GIT, if you need Docker deployment etc. I am not sure if new starters should be responsible for the build pipeline, thus I am not sure that i understand  your concern in this area.

From my experience, integration tests for Spark can be run on any of these platforms.

Best regards

> On 13 Mar 2017, at 10:55, Sam Elamin <hussam.elamin@gmail.com> wrote:
> Hi Folks
> This is more of a general question. What's everyone using for their CI /CD when it comes to spark
> We are using Pyspark but potentially looking to make to spark scala and Sbt in the future
> One of the suggestions was jenkins but I know the UI isn't great for new starters so I'd rather avoid it. I've used team city but that was more focused on dot net development
> What are people using?
> Kind Regards
> Sam