Hi,

I've been following this thread for a while. 

I'm trying to bring in a test strategy in my team to test a number of data pipelines before production. I have watched Lars' presentation and find it great. However I'm debating whether unit tests are worth the effort if there are good job-level and pipeline-level tests. Does anybody have any experiences benefitting from unit-tests in such a case?

Cheers,
Shiv

On Mon, Dec 12, 2016 at 6:00 AM, Juan Rodríguez Hortalá <juan.rodriguez.hortala@gmail.com> wrote:
Hi all,

I would also would like to participate on that.

Greetings,

Juan

On Fri, Dec 9, 2016 at 6:03 AM, Michael Stratton <michael.stratton@komodohealth.com> wrote:
That sounds great, please include me so I can get involved.

On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni <mmistroni@gmail.com> wrote:
Me too as I spent most of my time writing unit/integ tests....  pls advise on where I  can start
Kr

On 9 Dec 2016 12:15 am, "Miguel Morales" <therevoltingx@gmail.com> wrote:
I would be interested in contributing.  Ive created my own library for this as well.  In my blog post I talk about testing with Spark in RSpec style: 
https://medium.com/@therevoltingx/test-driven-development-w-apache-spark-746082b44941

Sent from my iPhone

On Dec 8, 2016, at 4:09 PM, Holden Karau <holden@pigscanfly.ca> wrote:

There are also libraries designed to simplify testing Spark in the various platforms, spark-testing-base for Scala/Java/Python (& video https://www.youtube.com/watch?v=f69gSGSLGrY), sscheck (scala focused property based), pyspark.test (python focused with py.test instead of unittest2) (& blog post from nextdoor https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b#.jw3bdcej9 )

Good luck on your Spark Adventures :)

P.S.

If anyone is interested in helping improve spark testing libraries I'm always looking for more people to be involved with spark-testing-base because I'm lazy :p

On Thu, Dec 8, 2016 at 2:05 PM, Lars Albertsson <lalle@mapflat.com> wrote:
I wrote some advice in a previous post on the list:
http://markmail.org/message/bbs5acrnksjxsrrs

It does not mention python, but the strategy advice is the same. Just
replace JUnit/Scalatest with pytest, unittest, or your favourite
python test framework.


I recently held a presentation on the subject. There is a video
recording at https://vimeo.com/192429554 and slides at
http://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458

You can find more material on test strategies at
http://www.mapflat.com/lands/resources/reading-list/index.html




Lars Albertsson
Data engineering consultant
www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
Calendar: https://goo.gl/6FBtlS, https://freebusy.io/lalle@mapflat.com


On Thu, Dec 8, 2016 at 4:14 PM, pseudo oduesp <pseudo20140@gmail.com> wrote:
> somone can tell me how i can make unit test on pyspark ?
> (book, tutorial ...)

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org




--