spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Random sampling in tests
Date Mon, 08 Oct 2018 14:08:24 GMT
I'm personally not a big fan of doing it that way in the PR. It is
perfectly fine to employ randomized tests, and in this case it might even
be fine to just pick couple different timezones like the way it happened in
the PR, but we should:

1. Document in the code comment why we did it that way.

2. Use a seed and log the seed, so any test failures can be reproduced
deterministically. For this one, it'd be better to pick the seed from a
seed environmental variable. If the env variable is not set, set to a
random seed.



On Mon, Oct 8, 2018 at 3:05 PM Sean Owen <srowen@gmail.com> wrote:

> Recently, I've seen 3 pull requests that try to speed up a test suite
> that tests a bunch of cases by randomly choosing different subsets of
> cases to test on each Jenkins run.
>
> There's disagreement about whether this is good approach to improving
> test runtime. Here's a discussion on one that was committed:
> https://github.com/apache/spark/pull/22631/files#r223190476
>
> I'm flagging it for more input.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message