commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [rng] Releasing 1.3
Date Mon, 10 Jun 2019 17:21:46 GMT

On 10/06/2019 18:05, Gilles Sadowski wrote:
> Hi.
>
> Le lun. 10 juin 2019 à 18:33, Alex Herbert <alex.d.herbert@gmail.com> a écrit
:
>>
>> On 10/06/2019 17:18, Gilles Sadowski wrote:
>>> Le lun. 10 juin 2019 à 17:56, Alex Herbert <alex.d.herbert@gmail.com>
a écrit :
>>>> On 10/06/2019 16:34, Gilles Sadowski wrote:
>>>>> Hello.
>>>>>
>>>>> Le lun. 10 juin 2019 à 17:17, Alex Herbert <alex.d.herbert@gmail.com>
a écrit :
>>>>>> On 10/06/2019 15:31, Gilles Sadowski wrote:
>>>>>>>>> P.S. Thinking of releasing 1.3?
>>>>>>>> Not yet. I think there are a few outstanding items that work
together
>>>>>>>> for the multi-threaded focus of the new code and the new
generators:
>>>>>>> Sure but some of them could be postponed, if just to RERO.
>>>>>>>
>>>>>>>> - RNG-98: LongJumpable (easy)
>>>>>>>>
>>>>>>>> - RNG-102: SharedStateSampler (lots of easy work)
>>>>>>>>
>>>>>>>> - RNG-106: XorShiRo generators require non-zero input seeds
>>>>>>>>
>>>>>>>> (I'm still thinking about the best way to do this. The Jira
ticket
>>>>>>>> suggests a speed test to at least know the implications of
different ideas.)
>>>>>>> This is only when using the "SeedFactory" (?).  [Otherwise, it's
the
>>>>>>> user's responsibility to construct an appropriate seed.]
>>>>>>>
>>>>>>> Couldn't we just check that the output of the internal generator
is not
>>>>>>> all zero (and call it again if it is)?
>>>>>> Yes. The worse case scenario is a 1 in 2^64 collision rate with zero.
>>>>>> All other generators have larger state sizes. So this would be fine.
An
>>>>>> alternative would be to set a single bit to non zero. This throws
away 1
>>>>>> bit of randomness from the seed and will always work without any
>>>>>> recursion. But it makes the seed worse. The ideas are in the header
for
>>>>>> this Jira ticket:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/RNG-106
>>>>>>
>>>>>> I'll fix this soon.
>>>>>>
>>>>>> The other item I did not mention is outcome from RNG-104. This seems
to
>>>>>> indicate that using System.identityHashCode(new Object()) is not
as good
>>>>>> a mixer as a ThreadLocal random generator, both for speed and also
>>>>>> quality. I'm currently testing Well44497b ^ SplitMix in BigCrush
but I
>>>>>> think this should replace the identity hash code method.
>>>>> Didn't you also suggest to use XOR_SHIFT_1024_PHI (given the
>>>>> large enough period, better speed score on BigCrush)?
>>>> Yes. I'm still thinking about this. My initial calculations were based
>>>> on the length of time it would take to sample all the seeds from the
>>>> generator. Below we consider the number of possible seeds required and
>>>> produced:
>>>>
>>>> The Well4497b seed size is 1391 of ints so there are (2^32)^1391
>>>> possible seeds of random bits, or 2^1423.
>>>>
>>>> The XorShift1024 period is (2^1024 -1) of longs so there are (2^64)^1024
>>>> random bits output before repeat (ideally as the period may be shorter).
>>>> So the bit output is max 2^1088 before repeat.
>>>>
>>>> Assuming the output from XorShift1024 is truly random-per-bit it cannot
>>>> output enough unique seeds to cover all those required by the Well44497b
>>>> generator.
>>>>
>>>> But currently we seed using a maximum of 128 values and leave the rest
>>>> to the self-seeding routine in the base generator. For a long array this
>>>> is (2^64)^128 = 2^192, and only 2^160 for int arrays. So the
>>>> XorShift1024 generator can output enough bits to create all the seeds
>>>> that the SeedFactory currently produces.
>>> I'd think (roughly) that's a good enough compromise for a
>>> "convenience" routine. [Stringent requirements can be met
>>> explicitly otherwise.]
>> Sorry. I've done that wrong:
>>
>> (a^b)^c = a^(b^c) not a^(b+c)
>>
>> Well44497b seed size is 1391 of ints so there are 32*1391 (44512) bits
>> in the seed, and so 2^44512 possible seeds.
>>
>> XorShift1024 period is (2^1024 -1) of longs so there are 64*2^1024
>> random bits output before repeat, or 2^6 * 2^1024 = 2^1030.
>>
>> Not enough.
>>
>> The max seed size is long[128] which is 64*128 bits in the seed, and so
>> 2^8192 possible seeds.
>>
>> Still not enough.
>>
>> So even though it would take 2^969 years to repeat the period of a
>> XorShift1024 generator if sampled 10 billion time a second [1], it
>> cannot produce every possible seed currently required.
>>
>> So this is a dilemma. Choose a generator that can theoretically output
>> all the seeds required, even though you could never use them, or choose
>> a faster generator that can still output more seeds than you could
>> possibly use.
> How about using both?
> Generate the first element of the seed array (or the single "long" seed)
> from Well44497b and the rest from XorShift1024?

I think this just splits the generation of a seed of n bits into two 
seeds of x and n-x bits. So if the requirement is to generate an array 
long[128] you then have long + long[127]. And the number of combinations 
of long[127] is still too big for XorShift1024.

I'd be happy with XorShift1024 as it cannot realistically be repeated 
and outputs better seed quality. It could be documented that using 
SecureRandom to access the system pool of entropy would produce a high 
quality seed but would be slow and ThreadLocalRandom can produce the 
seed faster but with a short period. Perhaps it could be tied to 
Abhishek's demonstration of the speed of the SecureRandom algorithms in 
Java 9 on a new page in the user guide?

I'm running the benchmarks again with both generators to see what the 
performance effect is of switching. I'll revisit this when I have the 
results.

>
> Gilles
>
>> [1]
>> https://issues.apache.org/jira/browse/RNG-104?focusedCommentId=16857619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16857619
>>
>>>> So a switch to using XorShift1024 would satisfy current requirements
>>>> and, given it passes BigCrush, would remove the requirement to mix the
>>>> output with a second generator. (IIUC the purpose is to improve
>>>> randomness of Well44497.)
>>> Is it really necessary to have the utmost randomness for generating
>>> seeds?  [It seems that the (rare) correlations of a seed used by some
>>> RNG instance with a seed used by some other instance will be "diluted"
>>> in the different sequences generated by the two instances.]
>>>
>>> Whatever, replacing with XorShift1024 will gain on that too, and
>>> as you note, keep the SeedFactory simpler (no mix).
>>>
>>> Regards,
>>> Gilles
>>>
>>>>> [...]
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message