spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: unit testing in spark
Date Mon, 10 Apr 2017 14:32:32 GMT

I think in the end you need to check the coverage of your application. If your application
is well covered on the job or pipeline level (depends however on how you implement these tests)
then it can be fine.
In the end it really depends on the data and what kind of transformation you implement. For
example, you have 90% of your job with standard transformations, but 10% are more or less
complex customized functions, then it might be worth to test the function with many different
data inputs as unit tests and have integrated job/pipeline tests in addition.

> On 10. Apr 2017, at 15:46, Gokula Krishnan D <email2dgk@gmail.com> wrote:
> 
> Hello Shiv, 
> 
> Unit Testing is really helping when you follow TDD approach. And it's a safe way to code
a program locally and also you can make use those test cases during the build process by using
any of the continuous integration tools ( Bamboo, Jenkins). If so you can ensure that artifacts
are being tested before deploying into Cluster.
> 
> 
> Thanks & Regards, 
> Gokula Krishnan (Gokul)
> 
>> On Wed, Apr 5, 2017 at 7:32 AM, Shiva Ramagopal <tr.shiv@gmail.com> wrote:
>> Hi,
>> 
>> I've been following this thread for a while. 
>> 
>> I'm trying to bring in a test strategy in my team to test a number of data pipelines
before production. I have watched Lars' presentation and find it great. However I'm debating
whether unit tests are worth the effort if there are good job-level and pipeline-level tests.
Does anybody have any experiences benefitting from unit-tests in such a case?
>> 
>> Cheers,
>> Shiv
>> 
>>> On Mon, Dec 12, 2016 at 6:00 AM, Juan Rodríguez Hortalá <juan.rodriguez.hortala@gmail.com>
wrote:
>>> Hi all, 
>>> 
>>> I would also would like to participate on that. 
>>> 
>>> Greetings, 
>>> 
>>> Juan 
>>> 
>>>> On Fri, Dec 9, 2016 at 6:03 AM, Michael Stratton <michael.stratton@komodohealth.com>
wrote:
>>>> That sounds great, please include me so I can get involved.
>>>> 
>>>>> On Fri, Dec 9, 2016 at 7:39 AM, Marco Mistroni <mmistroni@gmail.com>
wrote:
>>>>> Me too as I spent most of my time writing unit/integ tests....  pls advise
on where I  can start
>>>>> Kr
>>>>> 
>>>>>> On 9 Dec 2016 12:15 am, "Miguel Morales" <therevoltingx@gmail.com>
wrote:
>>>>>> I would be interested in contributing.  Ive created my own library
for this as well.  In my blog post I talk about testing with Spark in RSpec style: 
>>>>>> https://medium.com/@therevoltingx/test-driven-development-w-apache-spark-746082b44941
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Dec 8, 2016, at 4:09 PM, Holden Karau <holden@pigscanfly.ca>
wrote:
>>>>>>> 
>>>>>>> There are also libraries designed to simplify testing Spark in
the various platforms, spark-testing-base for Scala/Java/Python (& video https://www.youtube.com/watch?v=f69gSGSLGrY),
sscheck (scala focused property based), pyspark.test (python focused with py.test instead
of unittest2) (& blog post from nextdoor https://engblog.nextdoor.com/unit-testing-apache-spark-with-py-test-3b8970dc013b#.jw3bdcej9
)
>>>>>>> 
>>>>>>> Good luck on your Spark Adventures :)
>>>>>>> 
>>>>>>> P.S.
>>>>>>> 
>>>>>>> If anyone is interested in helping improve spark testing libraries
I'm always looking for more people to be involved with spark-testing-base because I'm lazy
:p
>>>>>>> 
>>>>>>>> On Thu, Dec 8, 2016 at 2:05 PM, Lars Albertsson <lalle@mapflat.com>
wrote:
>>>>>>>> I wrote some advice in a previous post on the list:
>>>>>>>> http://markmail.org/message/bbs5acrnksjxsrrs
>>>>>>>> 
>>>>>>>> It does not mention python, but the strategy advice is the
same. Just
>>>>>>>> replace JUnit/Scalatest with pytest, unittest, or your favourite
>>>>>>>> python test framework.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I recently held a presentation on the subject. There is a
video
>>>>>>>> recording at https://vimeo.com/192429554 and slides at
>>>>>>>> http://www.slideshare.net/lallea/test-strategies-for-data-processing-pipelines-67244458
>>>>>>>> 
>>>>>>>> You can find more material on test strategies at
>>>>>>>> http://www.mapflat.com/lands/resources/reading-list/index.html
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Lars Albertsson
>>>>>>>> Data engineering consultant
>>>>>>>> www.mapflat.com
>>>>>>>> https://twitter.com/lalleal
>>>>>>>> +46 70 7687109
>>>>>>>> Calendar: https://goo.gl/6FBtlS, https://freebusy.io/lalle@mapflat.com
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Dec 8, 2016 at 4:14 PM, pseudo oduesp <pseudo20140@gmail.com>
wrote:
>>>>>>>> > somone can tell me how i can make unit test on pyspark
?
>>>>>>>> > (book, tutorial ...)
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Cell : 425-233-8271
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>> 
>>> 
>> 
> 

Mime
View raw message