[ https://issues.apache.org/jira/browse/RNG85?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=16814611#comment16814611
]
Alex D Herbert commented on RNG85:

I've put together an implementation of this generator without using the original code to do
the seeding.
To check it against a reference implementation I currently have the RNG constructor accept
a single {{long}}. This is used to create the increment for the Weyl sequence. The only condition
for this input is that is an odd number. However it helps if this increment has good randomness
in the bits (more on this later). So the input seed is checked for complexity using a method
similar to {{SplittableRandom}} (which is also based on a Weyl sequence). The check counts
the number of transitions in the seed from 0 to 1 or vice versa.
{code:java}
static int countTransitions(long value) {
// Signextending shift properly counts bits at the ends
return Long.bitCount(value ^ (value >> 1));
}
{code}
If it passes a critical value then the value goes through untouched. So it can be seeded using
any of the example seeds in the original paper or source code which have good complexity.
If the seed is low complexity then the complexity is boosted by {{xor}} ing the seed with
an alternating bit mask {{0xaaaaaaaa}} for the upper and lower 32bits separately. The SplittableRandom
defines complexity as less than 24 transitions in the 64bit seed. I have modelled the possible
transitions as a Binomial distribution (BD) with n the number of changes and p set to 0.5.
Dividing the seed into upper and lower bits this is then modelled using a BD(n=31,p=0.5).
This has a mean of np which is 15.5. The cumulative probabilty is readily computed. The cutoff
of 24 used in the SplittableRandom corresponds to approximately 1.6% of possible values. (Note:
This value is computed empirically from a distribution of random numbers as the SplittableRandom
uses an unsigned bit shift when computing transitions leading to overcounting by approximately
0.5. As such it cannot be modelled using a binomial distribution.) I have set the threshold
as 0.5% of possible values in the upper and lower 32bit separately. This requires at least
9 transitions. The result is approximately 1% of seeds are not good enough and must be adjusted.
This allows it to pass the tests in Core and Simple modules and a quick run using 8 trials
passes SmallCrush with no failures. So this seeding approach is working.
JMH results verses SplitMix:
randomSourceNameMethodScoreErrorMedianError/Score
BASELINEnextDouble2.6570.3392.5870.128
MSWSnextDouble5.3120.0065.3110.001
SPLIT_MIX_64nextDouble4.5390.0134.5370.003
 baselineDouble2.3490.3052.2870.130
 baselineVoid0.2690.0060.2670.022
BASELINEnextInt2.3630.0022.3630.001
MSWSnextInt3.2090.0033.2080.001
SPLIT_MIX_64nextInt3.5810.0443.5720.012
 baselineInt2.0450.0072.0430.003
 baselineVoid0.2660.0000.2660.000
Note using these results you can subtract the baseline from each score since it measures the
overhead within JMH for testing methods that produce a primitive value.
So not as fast for double generation but much faster for int generation.
Regarding the seeding.
The MSWS paper states that:
{noformat}
"The constant s should be an irregular bit pattern with roughly half of the
bits set to one and half of the bits set to zero. The best statistics have been
obtained with this type of pattern. Unduly sparse or dense values do not
produce good results."
{noformat}
So we could just seed the generator using any random number, as long as the number of bit
transitions is not too low. Defining the level for low is vague and currently the low value
is set at about 1% of all possible random numbers.
The paper suggests seeding using random hexadecimal digits:
{noformat}
"The digits are chosen so that the upper 8 are different and the lower 8 are
different and then 1 is ored into the result to make the constant odd."
{noformat}
The example source code provides a method to do this. The source code is GPL3 so cannot be
copied unless we get permission. But the method is described in the paper so could be reimplemented.
The seeding method uses an internal random generator to create the seed for the Weyl sequence.
So seeding is slow relative to just checking for complexity in the provided seed and boosting
it if necessary.
Note that the original code provides a set of 25000 example seeds using their method. This
can be analysed for bit transitions.
The histogram looks like this:
{noformat}
Transitions Count
19 2
20 0
21 18
22 0
23 129
24 0
25 632
26 0
27 1918
28 0
29 4083
30 0
31 5812
32 0
33 5770
34 0
35 3971
36 0
37 1951
38 0
39 573
40 0
41 113
42 0
43 25
44 0
45 3
{noformat}
Note that the method creates no seeds with an odd number of state transitions and no seeds
with less than 19 transitions. So using a cutoff for upper and lower 32bits of 9 transitions
is roughly equivalent.
Since the seeding routine ensures all 8 hex digits in the upper and lower 32bits are unique
the output seed will be assured of altering each 4 bit section of the Weyl sequence with each
iteration (with the exception of the use of the 0x0 hex digit).
The author's have also verified that their seeding routine is suitable for parallel computation
as the first 3 billion input values starting from 0 produce unique seeds.
Ideally I would like to avoid a complex seeding routine inside the core generator implementation.
Using the simple method of boosting the number of bit transitions is working.
If the generator is to be used for parallel computations then a user can create n generators
and just check that each input seed is unique. They can even be referred to the original paper
to produce seeds using that code. Currently my routine would modify 30 of those 25000 example
seeds (0.12%). The lowest transition count found in the 32bits is 6 which happens because
the final bit is forced to 1 (note the double 0xf at the end).
{noformat}
0xc 0x3 0x8 0x0 0x6 0x7 0xf 0xf
1100 0011 1000 0000 0110 0111 1111 1111
{noformat}
To avoid the modified input seed creating an non unique generator the unmodified seed can
be used to set the initial state of the generator. Thus all possible random seeds should create
a unique generator, although the sequences may overlap if the input seed has been modified.
Since we do not currently support construction of parallel generators that do not overlap
via the factory method in {{RandomSource.create}} I do not think it a priority to exactly
implement the original source seeding routine.
> Middle Square Weyl Sequence generator
> 
>
> Key: RNG85
> URL: https://issues.apache.org/jira/browse/RNG85
> Project: Commons RNG
> Issue Type: Subtask
> Components: core
> Reporter: Gilles
> Priority: Minor
> Labels: gsoc2019
>
> https://en.wikipedia.org/wiki/Middle_square_method#Middle_Square_Weyl_Sequence_PRNG

This message was sent by Atlassian JIRA
(v7.6.3#76005)
