spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From A Shaikh <shaikh.af...@gmail.com>
Subject Re: TDD in Spark
Date Fri, 20 Jan 2017 09:27:42 GMT
Thanks for all the suggestion. Very Helpful.

On 17 January 2017 at 22:04, Lars Albertsson <lalle@mapflat.com> wrote:

> My advice, short version:
> * Start by testing one job per test.
> * Use Scalatest or a standard framework.
> * Generate input datasets with Spark routines, write to local file.
> * Run job with local master.
> * Read output with Spark routines, validate only the fields you care
> about for the test case at hand.
> * Focus on building a functional regression test suite with small test
> cases before testing with large input datasets. The former improves
> productivity more.
>
> Avoid:
> * Test frameworks coupled to your processing technology - they will
> make it difficult to switch.
> * Spending much effort to small unit tests. Internal interfaces in
> Spark tend to be volatile, and testing against them results in high
> maintenance costs.
> * Input files checked in to version control. They are difficult to
> maintain. Generate input files with code instead.
> * Expected output files checked in to VC. Same reason. Validate
> selected fields instead.
>
> For a longer answer, please search for my previous posts to the user
> list, or watch this presentation: https://vimeo.com/192429554
>
> Slides at http://www.slideshare.net/lallea/test-strategies-for-
> data-processing-pipelines-67244458
>
>
> Regards,
>
>
>
> Lars Albertsson
> Data engineering consultant
> www.mapflat.com
> https://twitter.com/lalleal
> +46 70 7687109
> Calendar: https://goo.gl/6FBtlS, https://freebusy.io/lalle@mapflat.com
>
>
> On Sun, Jan 15, 2017 at 7:14 PM, A Shaikh <shaikh.afzal@gmail.com> wrote:
> > Whats the most popular Testing approach for Spark App. I am looking
> > something in the line of TDD.
>

Mime
View raw message