spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <>
Subject Re: Crowdsourced triage Scapegoat compiler plugin warnings
Date Thu, 13 Jul 2017 08:53:03 GMT
I'm happy to help out in the places I'm more familiar with some starting
next week. I suspect things probably just fell to the wayside for some
folks during the 2.2.0 release.

On Thu, Jul 13, 2017 at 1:16 AM, Hyukjin Kwon <> wrote:

> Hi all,
> Another gentle ping for help.
> Probably, let me open up a JIRA and proceed this after a couple of weeks
> if no one is going to do this although I hope someone takes this.
> Thanks.
> 2017-06-18 2:16 GMT+09:00 Sean Owen <>:
>> Looks like a whole lot of the results have been analyzed. I suspect
>> there's more than enough to act on already. I think we should wait until
>> after 2.2 is done.
>> Anybody prefer how to proceed here -- just open a JIRA to take care of a
>> batch of related types of issues and go for it?
>> On Sat, Jun 17, 2017 at 4:45 PM Hyukjin Kwon <> wrote:
>>> Gentle ping to dev for help. I hope this effort is not abandoned.
>>> On 25 May 2017 9:41 am, "Josh Rosen" <> wrote:
>>> I'm interested in using the Scapegoat
>>> <> Scala compiler plugin to find
>>> potential bugs and performance problems in Spark. Scapegoat has a useful
>>> built-in set of inspections and is pretty easy to extend with custom ones.
>>> For example, I added an inspection to spot places where we call
>>> *.apply()* on a Seq which is not an IndexedSeq
>>> <> in order to make it
>>> easier to spot potential O(n^2) performance bugs.
>>> There are lots of false-positives and benign warnings (as with any
>>> linter / static analyzer) so I don't think it's feasible to us to include
>>> this as a blocking step in our regular build. I am planning to build
>>> tooling to surface only new warnings so going forward this can become a
>>> useful code-review aid.
>>> The current codebase has roughly 1700 warnings that I would like to
>>> triage and categorize as false-positives or real bugs. I can't do this
>>> alone, so here's how you can help:
>>>    - Visit the Google Docs spreadsheet at
>>>    sheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit?
>>>    usp=sharing
>>>    <>
>>>    find an un-triaged warning.
>>>    - In the columns at the right of the sheet, enter your name in the
>>>    appropriate column to mark a warning as a false-positive or as a real bug
>>>    and/or performance issue. If think a warning is a real issue then use the
>>>    "comments" column for providing additional detail.
>>>    - Please don't file JIRAs or PRs for individual warnings; I suspect
>>>    that we'll find clusters of issues which are best fixed in a few larger PRs
>>>    vs. lots of smaller ones. Certain warnings are probably simply style issues
>>>    so we should discuss those before trying to fix them.
>>> The sheet has hidden columns capturing the Spark revision and Scapegoat
>>> revision. I can use this to programmatically update the sheet and remap
>>> lines after updating either Scapegoat (to suppress false-positives) or
>>> Spark (to incorporate fixes and surface new warnings). For those who are
>>> interested, the sheet was produced with this script:
>>> Depending on the results of this experiment we might want to integrate a
>>> high-signal subset of the Scapegoat warnings into our build. I'm also
>>> hoping that we'll be able to build a useful corpus of triaged warnings in
>>> order to help improve Scapegoat itself and eliminate common false-positives.
>>> Thanks and happy bug-hunting,
>>> Josh Rosen

Cell : 425-233-8271

View raw message