commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [Rng] New XoShiRo generators
Date Wed, 06 Mar 2019 22:07:41 GMT
> 
> On 6 Mar 2019, at 21:42, Alex Herbert <alex.d.herbert@gmail.com> wrote:
> 
> 
> 
>> On 6 Mar 2019, at 21:24, Gilles Sadowski <gilleseran@gmail.com> wrote:
>> 
>> Hello.
>> 
>> Le mer. 6 mars 2019 à 21:49, Alex Herbert <alex.d.herbert@gmail.com> a écrit
:
>>> 
>>> 
>>> 
>>>> On 6 Mar 2019, at 17:11, Gilles Sadowski <gilleseran@gmail.com> wrote:
>>>> 
>>>> Do the two variants produce uncorrelated sequences?
>>> 
>>> I will test this when I branch a new PR for just this code.
>> 
>> IMHO, it's strange that there would be 2 sources of randomness in a single
>> implementation.
>> Concretely: If one needs a fast "int" provider, and a fast "long" provider, I'd
>> consider the simpler solution of using 2 different providers.
> 
> I think this has crossed wires somewhere. I was talking about the variant of the XorShift1024Star
algorithm and whether XorShift1024Star should be deprecated in favour of XorShift1024StarPhi.
> 
> The variant of the SplitMix64 algorithm for producing ints was tested in a benchmark
that I am prepared to throw away. The results are in the Jira ticket. The way the SplittableRandom
creates an int is slightly slower than the method used in [RNG] SplitMix64 which divides the
long in half. This ticket can be closed as done and I’ll add a comment that no speed improvement
was found.
> 
> I agree that this variant algorithm should have been in a new provider. It would produce
a different output of bytes since the bit shift in the second step is different. But I’m
not going to add this algorithm so it does not matter.
> 
> However I will test if XorShift1024Star and XorShift1024StarPhi are correlated just for
completeness.
> 

Did a test of 100 repeats of a correlation of 50 longs from the XorShift1024Star and XorShift1024StarPhi,
new seed each time:

SummaryStatistics:
n: 100
min: -0.30893547071559685
max: 0.37616626218398586
sum: 3.300079237520435
mean: 0.033000792375204355
geometric mean: NaN
variance: 0.022258533475114764
population variance: 0.022035948140363616
second moment: 2.2035948140363617
sum of squares: 2.312500043775496
standard deviation: 0.14919294043323486
sum of logs: NaN

Note that the algorithm is the same except the final step when the multiplier is used to scale
the final output long:

   return state[index] * multiplier;

So if it was outputting a double the correlation would be 1. But it is a long generator so
the long arithmetic wraps to negative on large multiplications. The result is that the mean
correlation is close to 0.

A single repeat using 1,000,000 numbers has a correlation of 0.002.

Am I missing something here with this type of test?

>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message