spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaliy Pisarev <vitaliy.pisa...@biocatch.com>
Subject Re: Testing Apache Spark applications
Date Thu, 15 Nov 2018 19:26:08 GMT
Hard to answer in a succinct manner but I'll give it a shot.

Cucumber is a tool for writing *Behaviour* Driven Tests (closely related to
behaviour driven development, BDD).
It is not a mere *technical* approach to testing but a mindset, a way of
work and a different (different, whether it is better is a matter of
controversy) way to structure communication between product and R&D.

I will not elaborate more as there is plenty of material out there if you
want to educate yourself. Just bear in mind that BDD is riddled with
misconception. Most often than not I see people just using Cucumber, but
not doing actual BDD.

Regarding unit testing, I do not consider the code you showed to be a good
candidate for unit testing. There is very little procedural logic there and
there is a good chance that if you go about unit testing it you will end up
with lots and lots of mocks overly bound to the implementation details of
the suit under test , rendering the tests unmaintainable and brittle.

I would argue that unit tests are more appropriate for code that is
algorithmic in nature, that has no or very little dependencies and where
you have an absolute oracle of truth regrading your expectations from it.

I think that in your situation going for integration tests (on small scale
data) and regression tests would give you the most ROI.






On Thu, Nov 15, 2018 at 8:43 PM ☼ R Nair <ravishankar.nair@gmail.com> wrote:

> Sparklens from qubole is a good source. Other tests are to be handled by
> developer.
>
> Best,
> Ravi
>
> On Thu, Nov 15, 2018, 12:45 PM <Omer.Ozsakarya@sony.com wrote:
>
>> Hi all,
>>
>>
>>
>> How are you testing your Spark applications?
>>
>> We are writing features by using Cucumber. This is testing the
>> behaviours. Is this called functional test or integration test?
>>
>>
>>
>> We are also planning to write unit tests.
>>
>>
>>
>> For instance we have a class like below. It has one method. This methos
>> is implementing several things: like DataFrame operations, saving DataFrame
>> into database table, insert, update,delete statements.
>>
>>
>>
>> Our classes generally contains 2 or 3 methods. These methods cover a lot
>> of tasks in the same function defintion. (like the function below)
>>
>> So I am not sure how I can write unit tests for these classes and methods.
>>
>> Do you have any suggestion?
>>
>>
>>
>> class CustomerOperations
>>
>>
>>
>>    def doJob(inputDataFrame : DataFrame) = {
>>
>>            // definitions (value/variable)
>>
>>            // spark context, session etc definition
>>
>>
>>
>>           //  filtering, cleansing on inputDataframe and save results on
>> a new dataframe
>>
>>           // insert new dataframe to a database table
>>
>>          //  several insert/update/delete statements on the database
>> tables
>>
>>
>>
>>     }
>>
>>
>>
>>
>>
>

Mime
View raw message