spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Spark and continuous integration
Date Tue, 14 Mar 2017 11:56:54 GMT
I agree the reporting is an important aspect. Sonarqube (or similar tool) can report over time,
but does not support Scala (well indirectly via JaCoCo). In the end, you will need to think
about a dashboard that displays results over time. 

> On 14 Mar 2017, at 12:44, Steve Loughran <stevel@hortonworks.com> wrote:
> 
> 
>> On 13 Mar 2017, at 13:24, Sam Elamin <hussam.elamin@gmail.com> wrote:
>> 
>> Hi Jorn
>> 
>> Thanks for the prompt reply, really we have 2 main concerns with CD, ensuring tests
pasts and linting on the code. 
> 
> I'd add "providing diagnostics when tests fail", which is a combination of: tests providing
useful information and CI tooling collecting all those results and presenting them meaningfully.
The hard parts are invariably (at least for me)
> 
> -what to do about the intermittent failures
> -tradeoff between thorough testing and fast testing, especially when thorough means "better/larger
datasets"
> 
> You can consider the output of jenkins & tests as data sources for your own analysis
too: track failure rates over time, test runs over time, etc: could be interesting. If you
want to go there, then the question of "which CI toolings produce the most interesting machine-parseable
results, above and beyond the classic Ant-originated XML test run reports"
> 
> I have mixed feelings about scalatest there: I think the expression language is good,
but the maven test runner doesn't report that well, at least for me:
> 
> https://steveloughran.blogspot.co.uk/2016/09/scalatest-thoughts-and-ideas.html
> 
> 
>> 
>> I think all platforms should handle this with ease, I was just wondering what people
are using.
>> 
>> Jenkins seems to have the best spark plugins so we are investigating that as well
as a variety of other hosted CI tools
>> 
>> Happy to write a blog post detailing our findings and sharing it here if people are
interested 
>> 
>> 
>> Regards
>> Sam
>> 
>>> On Mon, Mar 13, 2017 at 1:18 PM, Jörn Franke <jornfranke@gmail.com> wrote:
>>> Hi,
>>> 
>>> Jenkins also now supports pipeline as code and multibranch pipelines. thus you
are not so dependent on the UI and you do not need anymore a long list of jobs for different
branches. Additionally it has a new UI (beta) called blueocean, which is a little bit nicer.
You may also check GoCD. Aside from this you have a huge variety of commercial tools, e.g.
Bamboo.
>>> In the cloud, I use for my open source github projects Travis-Ci, but there are
also a lot of alternatives, e.g. Distelli.
>>> 
>>> It really depends what you expect, e.g. If you want to Version the build pipeline
in GIT, if you need Docker deployment etc. I am not sure if new starters should be responsible
for the build pipeline, thus I am not sure that i understand  your concern in this area.
>>> 
>>> From my experience, integration tests for Spark can be run on any of these platforms.
>>> 
>>> Best regards
>>> 
>>> > On 13 Mar 2017, at 10:55, Sam Elamin <hussam.elamin@gmail.com> wrote:
>>> >
>>> > Hi Folks
>>> >
>>> > This is more of a general question. What's everyone using for their CI /CD
when it comes to spark
>>> >
>>> > We are using Pyspark but potentially looking to make to spark scala and
Sbt in the future
>>> >
>>> >
>>> > One of the suggestions was jenkins but I know the UI isn't great for new
starters so I'd rather avoid it. I've used team city but that was more focused on dot net
development
>>> >
>>> >
>>> > What are people using?
>>> >
>>> > Kind Regards
>>> > Sam
>> 
> 

Mime
View raw message