commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [RNG] Duplicated blocks?
Date Fri, 13 Sep 2019 20:34:41 GMT


> On 13 Sep 2019, at 15:27, Gilles Sadowski <gilleseran@gmail.com> wrote:
> 
> Hello.
> 
> SonarQube reports "duplicated blocks":
>    https://sonarcloud.io/component_measures?id=commons-rng&metric=duplicated_blocks&view=list
> 
> Does it report about the license header?

No. If you click though to the source code (from a violation in the list) and scroll you will
see a bar in the left margin for the duplicated code. You can click on it and it states where
the duplicate is.

There are 13 reported duplicates and it’s not always a spurious report. Some of the code
blocks are duplicates. In this case it is for code that performs work to compute more than
a single value. As such it is not a simple refactor to move the block into a method without
having to return an object encapsulating the multiple values computed; or pass in an array
to hold all the value computed; or have a common parent with shared state variable. In the
case of common code in the random generator next() method of Well19937a and Well44497a the
various workarounds to eliminate common code would have negative effects and is not worth
it IMO. In one case it is in classes within different packages (e.g. AbstractPcg6432 vs PcgRxsMXs64)
where the amount of duplicate code is actually the same but it is outside of the package structure
to have a common parent. That is unreleased code though so could be rearranged without breaking
binary compatibility.

In about half of the cases it is a spurious report where:

- the code block as text is identical but the types of the variables are different (e.g. byte/short/int
in the MarsagliaTsangWangDiscreteSampler) - in these cases I could try renaming variables
but it seems odd since the code is deliberately copied with modification only for the data
type.
- the value of the constants are different on a single line of code that calls a method (e.g.
the AbstractXoRoShiRo128 vs AbstractXoShiRo256).

In one case it is exactly the same algorithm (BoxMullerGaussianSampler vs BoxMullerNormalizedGaussianSampler);
here one is deprecated as it is a known duplicate.

> If so, how to deactivate that spurious report?

Don’t know. I’m currently OK with all the duplicates as they are. Ideally they could be
removed as exceptions without turning off the duplicates report but leaving them on is fine.

There are a fair number of other code smells (total = 107).

We can get rid of 9 by renaming SEED to S in the commons/rng/simple/internal/ Converters.

There are 33 for deprecation to ignore.

There are 18 for tests that have no assertions. These are tests that check constructors do
not throw when self-seeded (i.e. input seed is too small). They could be updated to assert
the created object is not null. It seems pointless to do that. The methods are commented to
state they are checking construction paths. The generators could be tested to ensure they
do not return all zeros or the same number. But a full statistical test of the output seems
over the top.

There are 15 'Extract the assignment out of this expression’ that I prefer as is in MarsagliaTsangWangDiscreteSampler.
But I did write it. I could change this and it would not affect sampler performance as the
code is all in the sampler creation method. The other 3 are in the generator next() methods
and are written that way to match the reference code. It is a style that I am fine with in
these cases. If changed I would like to do performance tests to check it doesn’t make the
method slower to remove a coding style warning. So I recommend not changing those for simplicity.

3 for commented out code in the pom.xml could be fixed.

There are 2 ignored tests using zero array seeds. It is known that xor-shift based generators
cannot handle all zero seeds. These tests would never work for those generators (there are
18 such generators in the library). I suggest deleting these tests. An alternative would be
to test the first 50 values. If all zero then assume the generator is dead. Otherwise do the
statistical test on the generator. The test could be commented to state that if a generator
is outputting non-zero values from an all zero seed that the output should be random.

The TODO comment in InternalGamma can be removed. It is 3 years old and there is not currently
a need to move this helper class to a different component.

Then there are a few legitimate oddities for the remaining 10 or so. I can comment the code
so this can be seen when clicking through from Sonar.

Doing all the fixes would remove 48 code smells. It would make looking at the Sonar report
easier.

Alex


> 
> Regards,
> Gilles
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message