spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Rashid <>
Subject Re: python tests: any reason for a huge
Date Wed, 12 Sep 2018 17:47:49 GMT
So I've had some offline discussion around this, so I'd like to clarify.
SPARK-25344 maybe some non-trivial work to do, as its significant

But can we agree on an *immediate* first step: all new python tests should
go into their own files?  is there some reason to not do that right away?

I understand that in some case, you'll want to add a test case that really
is related to an existing test already in those giant files, and it makes
sense for you to keep them close.   Its fine to decide on a case-by-case
basis whether we should do the relevant refactoring for that relevant bit
at the same or just put it in the same file.  But we should still have this
*goal* in mind, so you should do it in the cases where its really
independent cases.

That avoid us making the problem worse till we get to SPARK-25344, and
furthermore it will allow work on SPARK-25344 to eventually proceed without
never ending merge conflicts with other changes that are also adding new

On Wed, Sep 5, 2018 at 1:27 PM Imran Rashid <> wrote:

> I filed
> On Fri, Aug 24, 2018 at 11:57 AM Reynold Xin <> wrote:
>> We should break it.
>> On Fri, Aug 24, 2018 at 9:53 AM Imran Rashid <>
>> wrote:
>>> Hi,
>>> another question from looking more at python recently.  Is there any
>>> reason we've got a ton of tests in one humongous file, rather than
>>> breaking it out into smaller files?
>>> Having one huge file doesn't seem great for code organization, and it
>>> also makes the test parallelization in not work as well.  On
>>> my laptop, takes 150s, and the next longest test file takes only
>>> 20s.
>>> can we at least try to put new tests into smaller files?
>>> thanks,
>>> Imran

View raw message