spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: Adding JIRA ID as the prefix for the test case name
Date Fri, 22 Nov 2019 04:09:44 GMT
I opened a PR - https://github.com/apache/spark-website/pull/232

2019년 11월 19일 (화) 오전 9:22, Hyukjin Kwon <gurwls223@gmail.com>님이 작성:

> Let me document as below in few days:
>
> 1. For Python and Java, write a single comment that starts with JIRA ID
> and short description, e.g. (SPARK-XXXXX: test blah blah)
> 2. For R, use JIRA ID as a prefix for its test name.
>
> assuming everybody is happy.
>
> 2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon <gurwls223@gmail.com>님이
작성:
>
>> Actually there are not so many Java test cases in Spark (because Scala
>> runs on JVM as everybody knows)[1].
>>
>> Given that, I think we can avoid to put some efforts on this for now .. I
>> don't mind if somebody wants to give a shot since it looks good anyway but
>> to me I wouldn't spend so much time on this ..
>>
>> Let me just go ahead as I suggested if you don't mind. Anyone can give a
>> shot for Display Name - I'm willing to actively review and help.
>>
>> [1]
>> git ls-files '*Suite.java' | wc -l
>>      172
>> git ls-files '*Suite.scala' | wc -l
>>     1161
>>
>> 2019년 11월 18일 (월) 오전 3:27, Steve Loughran <stevel@cloudera.com>님이
작성:
>>
>>> Test reporters do often contain some assumptions about the characters in
>>> the test methods. Historically JUnit XML reporters have never sanitised the
>>> method names so XML injection attacks have been fairly trivial. Haven't
>>> tried this for a while.
>>>
>>> That whole JUnit XML report "standard" was actually put together in the
>>> Ant project with <Junitreport> doing the postprocessing of the JUnit run.
>>> It was driven by the team's XSL skills than any overreaching strategic goal
>>> about how to present test results of tests which could run for hours and
>>> whose output you would really want to aggregate the locks from multiple
>>> machines and processes and present in awake you can actually navigate. With
>>> hindsight, a key failing is that we chose to store the test summaries (test
>>> count, failure count...) as attributes on the root XML mode. Which is why
>>> the whole DOM gets built up in the JUnit runner. Which is why when that
>>> JUnit process crashes, you get no report at all.
>>>
>>> It'd be straightforward to fix -except too much relies on that file
>>> now...important things will break. And the maven runner has historically
>>> never supported custom reporters, to let you experiment with it.
>>>
>>> Maybe this is an opportunity to change things.
>>>
>>> On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon <gurwls223@gmail.com>
>>> wrote:
>>>
>>>> DisplayName looks good in general but actually here I would like first
>>>> to find a existing pattern to document in guidelines given the actual
>>>> existing practice we all are used to. I'm trying to be very conservative
>>>> since this guidelines affect everybody.
>>>>
>>>> I think it might be better to discuss separately if we want to change
>>>> what we have been used to.
>>>>
>>>> Also, using arbitrary names might not be actually free due to such bug
>>>> like https://github.com/apache/spark/pull/25630 . It will need some
>>>> more efforts to investigate as well.
>>>>
>>>> On Fri, 15 Nov 2019, 20:56 Steve Loughran, <stevel@cloudera.com.invalid>
>>>> wrote:
>>>>
>>>>>  Junit5: Display names.
>>>>>
>>>>> Goes all the way to the XML.
>>>>>
>>>>>
>>>>> https://junit.org/junit5/docs/current/user-guide/#writing-tests-display-names
>>>>>
>>>>> On Thu, Nov 14, 2019 at 6:13 PM Shixiong(Ryan) Zhu <
>>>>> shixiong@databricks.com> wrote:
>>>>>
>>>>>> Should we also add a guideline for non Scala tests? Other languages
>>>>>> (Java, Python, R) don't support using string as a test name.
>>>>>>
>>>>>> Best Regards,
>>>>>> Ryan
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon <gurwls223@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I opened a PR - https://github.com/apache/spark-website/pull/231
>>>>>>>
>>>>>>> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon <gurwls223@gmail.com>님이
작성:
>>>>>>>
>>>>>>>> > In general a test should be self descriptive and I don't
think we
>>>>>>>> should be adding JIRA ticket references wholesale. Any action
that the
>>>>>>>> reader has to take to understand why a test was introduced
is one too many.
>>>>>>>> However in some cases the thing we are trying to test is
very subtle and in
>>>>>>>> that case a reference to a JIRA ticket might be useful, I
do still feel
>>>>>>>> that this should be a backstop and that properly documenting
your tests is
>>>>>>>> a much better way of dealing with this.
>>>>>>>>
>>>>>>>> Yeah, the test should be self-descriptive. I don't think
adding a
>>>>>>>> JIRA prefix harms this point. Probably I should add this
sentence in the
>>>>>>>> guidelines as well.
>>>>>>>> Adding a JIRA prefix just adds one extra hint to track down
>>>>>>>> details. I think it's fine to stick to this practice and
make it simpler
>>>>>>>> and clear to follow.
>>>>>>>>
>>>>>>>> > 1. what if multiple JIRA IDs relating to the same test?
we just
>>>>>>>> take the very first JIRA ID?
>>>>>>>> Ideally one JIRA should describe one issue and one PR should
fix
>>>>>>>> one JIRA with a dedicated test.
>>>>>>>> Yeah, I think I would take the very first JIRA ID.
>>>>>>>>
>>>>>>>> > 2. are we going to have a full scan of all existing
tests and
>>>>>>>> attach a JIRA ID to it?
>>>>>>>> Yea, let's don't do this.
>>>>>>>>
>>>>>>>> > It's a nice-to-have, not super essential, just because
...
>>>>>>>> It's been asked multiple times and each committer seems having
a
>>>>>>>> different understanding on this.
>>>>>>>> It's not a biggie but wanted to make it clear and conclude
this.
>>>>>>>>
>>>>>>>> > I'd add this only when a test specifically targets a
certain
>>>>>>>> issue.
>>>>>>>> Yes, so this one I am not sure. From what I heard, people
adds the
>>>>>>>> JIRA in cases below:
>>>>>>>>
>>>>>>>> - Whenever the JIRA type is a bug
>>>>>>>> - When a PR adds a couple of tests
>>>>>>>> - Only when a test specifically targets a certain issue.
>>>>>>>> - ...
>>>>>>>>
>>>>>>>> Which one do we prefer and simpler to follow?
>>>>>>>>
>>>>>>>> Or I can combine as below (im gonna reword when I actually
document
>>>>>>>> this):
>>>>>>>> 1. In general, we should add a JIRA ID as prefix of a test
when a
>>>>>>>> PR targets to fix a specific issue.
>>>>>>>>     In practice, it usually happens when a JIRA type is a
bug or a
>>>>>>>> PR adds a couple of tests.
>>>>>>>> 2. Uses "SPARK-XXXX: test name" format
>>>>>>>>
>>>>>>>> If we have no objection with ^, let me go with this.
>>>>>>>>
>>>>>>>> 2019년 11월 13일 (수) 오전 8:14, Sean Owen <srowen@gmail.com>님이
작성:
>>>>>>>>
>>>>>>>>> Let's suggest "SPARK-12345:" but not go back and change
a bunch of
>>>>>>>>> test cases.
>>>>>>>>> I'd add this only when a test specifically targets a
certain issue.
>>>>>>>>> It's a nice-to-have, not super essential, just because
in the rare
>>>>>>>>> case you need to understand why a test asserts something,
you can
>>>>>>>>> go
>>>>>>>>> back and find what added it in the git history without
much
>>>>>>>>> trouble.
>>>>>>>>>
>>>>>>>>> On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon <gurwls223@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Hi all,
>>>>>>>>> >
>>>>>>>>> > Maybe it's not a big deal but it brought some confusions
time to
>>>>>>>>> time into Spark dev and community. I think it's time
to discuss about
>>>>>>>>> when/which format to add a JIRA ID as a prefix for the
test case name in
>>>>>>>>> Scala test cases.
>>>>>>>>> >
>>>>>>>>> > Currently we have many test case names with prefixes
as below:
>>>>>>>>> >
>>>>>>>>> > test("SPARK-XXXXX blah blah")
>>>>>>>>> > test("SPARK-XXXXX: blah blah")
>>>>>>>>> > test("SPARK-XXXXX - blah blah")
>>>>>>>>> > test("[SPARK-XXXXX] blah blah")
>>>>>>>>> > …
>>>>>>>>> >
>>>>>>>>> > It is a good practice to have the JIRA ID in general
because,
>>>>>>>>> for instance,
>>>>>>>>> > it makes us put less efforts to track commit histories
(or even
>>>>>>>>> when the files
>>>>>>>>> > are totally moved), or to track related information
of tests
>>>>>>>>> failed.
>>>>>>>>> > Considering Spark's getting big, I think it's good
to document.
>>>>>>>>> >
>>>>>>>>> > I would like to suggest this and document it in
our guideline:
>>>>>>>>> >
>>>>>>>>> > 1. Add a prefix into a test name when a PR adds
a couple of
>>>>>>>>> tests.
>>>>>>>>> > 2. Uses "SPARK-XXXX: test name" format which is
used in our code
>>>>>>>>> base most
>>>>>>>>> >       often[1].
>>>>>>>>> >
>>>>>>>>> > We should make it simple and clear but closer to
the actual
>>>>>>>>> practice. So, I would like to listen to what other people
think. I would
>>>>>>>>> appreciate if you guys give some feedback about when
to add the JIRA
>>>>>>>>> prefix. One alternative is that, we only add the prefix
when the JIRA's
>>>>>>>>> type is bug.
>>>>>>>>> >
>>>>>>>>> > [1]
>>>>>>>>> > git grep -E 'test\("\SPARK-([0-9]+):' | wc -l
>>>>>>>>> >      923
>>>>>>>>> > git grep -E 'test\("\SPARK-([0-9]+) ' | wc -l
>>>>>>>>> >      477
>>>>>>>>> > git grep -E 'test\("\[SPARK-([0-9]+)\]' | wc -l
>>>>>>>>> >       16
>>>>>>>>> > git grep -E 'test\("\SPARK-([0-9]+) -' | wc -l
>>>>>>>>> >       13
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>

Mime
View raw message