commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [Rng] User Guide Quality section
Date Thu, 14 Mar 2019 21:38:31 GMT


> On 14 Mar 2019, at 18:22, Gilles Sadowski <gilleseran@gmail.com> wrote:
> 
> Hello.
> 
> Le jeu. 14 mars 2019 à 12:58, Alex Herbert <alex.d.herbert@gmail.com> a écrit
:
>> 
>> I've just added a Jira task to add the new XorShiRo generaters to the
>> user guide.
>> 
>> I note that the order of the RNGs in the *Quality* section does not
>> match the order of the RandomSource enum, nor is it alphabetical. The
>> list order is defined in
>> org.apache.commons.rng.examples.stress.GeneratorsList. Nearly all the
>> IntProviders are first with the exception of MWC_256 and KISS.
>> 
>> So:
>> 
>> - Should the new XoShiRo generators be added to the end?
> 
> +1
> [To minimize source code changes.]

Agreed.

> 
>> - Should all the *Quality* results be reordered to match the
>> RandomSource enum?
> 
> +0
> [Unless some other orders make sense: from fast to slow, from good
> to bad, …]

Since the order in the rng.apt file does not have to match the underlying GeneratorsList or
RandomSource enum then sorting/ordering the user guide table is the easiest option.

If we use total failures for BigCrush this makes the order:

RNG identifier	Dieharder	BigCrush
JDK	11, 12, 13	74, 72, 75
WELL_512_A	0, 0, 0	7, 6, 6
WELL_1024_A	0, 0, 0	4, 4, 5
WELL_44497_A	0, 0, 0	2, 3, 3
WELL_19937_A	0, 0, 0	3, 2, 3
WELL_19937_C	0, 1, 0	2, 2, 3
MT_64	0, 0, 1	3, 2, 3
MT	0, 1, 0	3, 2, 2
WELL_44497_B	0, , 0	2, 2, 2
KISS	0, 0, 0	1, 2, 0
ISAAC	0, 0, 1	0, 1, 0
TWO_CMRES	1, 1, 1	0, 0, 1
SPLIT_MIX_64	0, 0, 0	2, 0, 0
XOR_SHIFT_1024_S	0, 0, 0	2, 0, 0
MWC_256	0, 0, 0	0, 0, 0


I am not sure I like this. It appears more of a mess to my eye. At least in the current order
the WELL family is grouped together.

Maybe the only thing I would change is to put the MT and MT_64 together.

> 
>> - Should results be alphabetical?
> 
> -0
> [The names are not especially telling…]

Agreed. It was logical but not useful. I’ve just done it in my spreadsheet copy and it also
looks a mess to my eye.

> 
>> - Should the MWC_256 and KISS be moved so the generators are listed for
>> IntProvider then LongProvider?
> 
> ?

Since the end-user does not have to know if the RNG is a based on making int or long then
this suggestion is ruled out.

> 
>> Looking at the history for RandomSource enum and the GeneratorsList it
>> seems the order has always been the same in both, although not between
>> them. If new generators are added to the end of the RandomSource enum
>> then perhaps all the other lists used in the test suites and
>> benchmarking suites should match the same order.
> 
> The order in "RandomSource" is immaterial to the te user (i.e. an application
> should certainly not rely on that order).
> 
>> I would vote for requiring that new generators are added to the end of
>> the RandomSource enum.
> 
> +1
> 
>> All other lists through the code should match
>> this order.
> 
> What when the list split in two ("Int" vs "Long"), then combined for running
> a test/benchmark?  Results are not going to be sorted.  At first sight, I don't
> think it's worth making this sort of requirements (unless everything can be
> automated to provide the sorted output for free).

Keeping the order the same through the codebase was an idea just for maintenance. There is
very similar functionality here:

[test] org.apache.commons.rng.core.ProvidersList
[test] org.apache.commons.rng.simple.ProvidersList
org.apache.commons.rng.examples.stress.GeneratorsList
org.apache.commons.rng.examples.jmh.ConstructionPerformance$Sources
org.apache.commons.rng.examples.jmh.GenerationPerformance$Sources
org.apache.commons.rng.examples.jmh.distribution.SamplersPerformance$Sources

The first two build a list with a seed. Core uses a random seed from SecureRandom. Simple
uses a fixed seed. It still tests randomness using copied code from the tests in core. However
it does have a note: "check whether it is possible to have a single implementation.” As
an aside I think this test for randomness should stay but the generator should be built with
a null seed. Currently it tests empty seeds and an empty seed array is not processed specially
by the ProviderBuilder so this actually tests self-seeding of the generator. Self-Seeding
is already testing in the core module. I suggest the ProviderBuilder be updated to detect
empty seed arrays and do something better, like handle an empty array as if it were null.
The test in simple should also add test for randomness using a null seed, effectively testing
the RandomSource.create method creates a provider that works given no seed, i.e. like testEmptyIntArraySeed
but with a null seed:

@Test
public void testNullSeed() {
   // … Do the checkNextIntegerInRange test
}

I also note that there is this:

@Ignore@Test
public void testZeroIntArraySeed()

Currently the XorShift1024Star generator would fail this. This would also be failed by all
the new XoShiRo generators. I think this should be fixed and the test reinstated. Sooner or
later someone will try RandomSource.create(RandomSource.XYZ, new long[50]) and get a bad generator.
I am not sure how to fix it yet[1] but I would like to try.

The third list does not have any seeds. This arguably could be generated automatically using
the RandomSource enum for all generators that do not have multiple constructor arguments (i.e.
all but TWO_CMRES_SELECT).

The other lists are built using JMH annotations from strings. These cannot be auto-generated.

This makes only 2 key lists to maintain: one in core and one in simple. These should be updated
when a new RNG is added just to get tests working. The order does not matter. However I think
the test in simple should add an assertion that the list of providers is the expected length
given the RandomSource enum. Currently this would be in the static initialisation for the
ProvidersList:

if (LIST.size() != RandomSource.values().length) {
    throw new RuntimeException("Providers list is not the expected size");
}

This would make PRs for a new generator fail if the functionality in simple has not be added
to the tests but was added to RandomSource.

The third list could be rearranged to match the RandomSource enum and so would be autogenerated.
That would mean rearranging the results files from Dieharder and BigCrush. I’m happy to
do this with some scripted git mv commands.

The JMH lists I suppose would be updated when the benchmarks need to be redone for new generators.
This is manual labour.

Summary of my proposals:

- Keep the order in the user guide as is but move MT next to MT_64
- Auto generate the GeneratorsList from the RandomSource enum
- Update all the file names and the user guide to point to the renamed files for the new order
- New generators can go anywhere in the user guide table if related to others, or at the end
- Assert that the ProviderList in [simple] matches RandomSource in size
- Add test that a null seed passed to RandomSource.create can build a generator that passes
the test for randomness
- Update the ProviderBuilder to handle zero filled seeds for generators that cannot handle
them
- Update the ProviderBuilder to handle an empty array as if it were a null seed

[1] The new XoShiRo generators break when the entire seed is zero. However depending on the
first sequence iteration they can handle a lot of zeros. For one generator I seem to recall
that only the final long of the seed needs to have a single bit set in the least significant
X bits (I cannot recall X). If bits are set above the least significant X then even though
the seed is not zero the generator breaks (the first step shifted all the bits using <<
(64-X)). I tried to build a method to test for bad seeds but then it was decided that broken
generators were OK as they match the reference implementation. However we should do better
for the [simple] module and try to make it work, at least as far as possible.

> 
> Regards,
> Gilles
> 
>> 
>> 
>> The order in PR #30 for the XorShift1024StarPhi generator adds to the
>> RandomSource enum directly after XOR_SHIFT_1024_S.
>> 
>> The order in PR #20 for the new XorShiRo generators adds to the
>> RandomSource enum at the end in alphabetical order for the IntProviders
>> then again for the new LongProviders.
>> 
>> These can be fixed depending on the order decision when the PRs are merged.
>> 
>> 
>> Alex
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message