spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Cutler <cutl...@gmail.com>
Subject Re: python tests: any reason for a huge tests.py?
Date Fri, 14 Sep 2018 00:06:41 GMT
Hi Imran,

I agree it would be good to split up the tests, but there might be a couple
things to discuss first. Right now we have a single "test.py" for each
subpackage. I think it makes sense to roughly have a test file for most
modules, e.g. "test_rdd.py", but it might not always be clear cut and there
could be other ways to split them up.  Also, should we put the test files
in the same directory as source or a subdirectory named "tests." My
preference is for a subdirectory.  As for putting new tests into their own
files right away, it seems better to me to keep them with related tests for
now and separate as it's own task to avoid fragmenting the test suites. If
it's done incrementally, I don't think merge conflicts will cause a
problem. Let be summarize this in SPARK-25344.

Thanks,
Bryan

On Wed, Sep 12, 2018 at 10:48 AM Imran Rashid <irashid@cloudera.com.invalid>
wrote:

> So I've had some offline discussion around this, so I'd like to clarify.
> SPARK-25344 maybe some non-trivial work to do, as its significant
> refactoring.
>
> But can we agree on an *immediate* first step: all new python tests should
> go into their own files?  is there some reason to not do that right away?
>
> I understand that in some case, you'll want to add a test case that really
> is related to an existing test already in those giant files, and it makes
> sense for you to keep them close.   Its fine to decide on a case-by-case
> basis whether we should do the relevant refactoring for that relevant bit
> at the same or just put it in the same file.  But we should still have this
> *goal* in mind, so you should do it in the cases where its really
> independent cases.
>
> That avoid us making the problem worse till we get to SPARK-25344, and
> furthermore it will allow work on SPARK-25344 to eventually proceed without
> never ending merge conflicts with other changes that are also adding new
> tests.
>
> On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid <irashid@cloudera.com> wrote:
>
>> I filed https://issues.apache.org/jira/browse/SPARK-25344
>>
>> On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <rxin@databricks.com> wrote:
>>
>>> We should break it.
>>>
>>> On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid
>>> <irashid@cloudera.com.invalid> wrote:
>>>
>>>> Hi,
>>>>
>>>> another question from looking more at python recently.  Is there any
>>>> reason we've got a ton of tests in one humongous tests.py file, rather than
>>>> breaking it out into smaller files?
>>>>
>>>> Having one huge file doesn't seem great for code organization, and it
>>>> also makes the test parallelization in run-tests.py not work as well.  On
>>>> my laptop, tests.py takes 150s, and the next longest test file takes only
>>>> 20s.
>>>>
>>>> can we at least try to put new tests into smaller files?
>>>>
>>>> thanks,
>>>> Imran
>>>>
>>>

Mime
View raw message