commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [Rng] New XoShiRo generators
Date Mon, 18 Mar 2019 09:19:42 GMT


> On 18 Mar 2019, at 10:20, Gilles Sadowski <gilleseran@gmail.com> wrote:
> 
> Le dim. 17 mars 2019 à 01:01, Alex Herbert <alex.d.herbert@gmail.com <mailto:alex.d.herbert@gmail.com>>
a écrit :
>> 
>> 
>> 
>>> On 16 Mar 2019, at 23:10, Alex Herbert <alex.d.herbert@gmail.com <mailto:alex.d.herbert@gmail.com>>
wrote:
>>> 
>>> 
>>> 
>>>> On 16 Mar 2019, at 02:54, Gilles Sadowski <gilleseran@gmail.com <mailto:gilleseran@gmail.com>
<mailto:gilleseran@gmail.com <mailto:gilleseran@gmail.com>>> wrote:
>>>>> This is read by dieharder which directly reads from stdin. This worked
to collect all the generated bits and the serial and xor composites failed the test suite.
>>>>> 
>>>>> It is also read by the stdin2testu01.c program to pass to TestU01.
>>>>> 
>>>>> What is happening is that the stdin2testu01.c is reading 64-bits using
an unsigned long.
>>>> 
>>>> I don't remember why I wrote that, but as you pointed outit now looks
>>>> like a plain bug.
>>> 
>>> It may be more complicated again...
>>> 
>>> I’ve had a play around with the data being pushed through to the testU01 library
using the c bridge. I wanted to check that the int value that is generated by the RNG is passed
through to the c program. So I wrote a simple BridgeTester class to do this. It writes all
the int values to a data file (for reference) then passes them to the c executable with the
same method as the RandomStressTester. I then modified the stdin2testu01.c program to have
an extra hidden debug mode where all the data is just written to stdout.
>>> 
>>> I found the data file written from Java did not match the data that the c program
had. I bit more digging found that the problem was that Java uses a big endian representation
and the c program is little endian. This is true on my linux and Mac OSX platforms. So the
raw bytes read from stdin are in the wrong order.
>>> 
>>> When I updated the program to self detect endianness and swap the byte order
of each set of 4 bytes from the stdin then the data in the c program matched the original.
>>> 
>>> Since it was non destructive to the module I added all this to master. You can
see this working by rebuilding the c bridge and running the new profile to test it:
>>> 
>>>> cd commons-rng-examples/examples-stress
>>>> gcc src/main/c/stdin2testu01.c -o stdin2testu01 -ltestu01 -ltestu01probdist
-ltestu01mylib -lm
>>>> mvn test -P bridge
>>> 
>>> You should see two files:
>>> 
>>> target/bridge.data
>>> target/bridge.out
>>> 
>>> These should have the same contents. The .data file is written by the java program,
and the .out file is the stdout captured from the c program with its view of the data.
>>> 
>>> This should fix running TestU01.
>>> 
>>> BUT I’ve not had time to determine how Dieharder is reading the stdin. Given
it is a c library it may be reading it using little endian as well. I’ll look into that
next.
>>> 
>>> Composite update:
>>> 
>>> For some reason all my BigCrush simulations crashed. It could be a RAM issue.
The runs did take longer than expected but I did not monitor memory usage. I’ve started
them again but using only the serial composite. I think the xor one is really broken.
>>> 
>>> FYI. Using the new bridge code with 3 runs of SmallCrush finds [6, 6, 6] / 15
failed tested for the serial composite and [9, 9, 10] / 15 for the xor composite.
>>> 
>>> I’m expecting BigCrush to fail a lot. I’m now more interested in seeing if
it will complete.
>>> 
>>> Alex
>>> 
>> 
>> 
>> PS. Thinking about the endianness it might not matter. The test suite ideally will
be able to detect if the bits are not random in the lower or upper most significant byte of
the 32 bits. I.e. it should always find a problem. I am not clear if this is the case. I have
read that some generators can pass BigCrush but fail if the bits are reversed (not the bytes
but the bits). I’m happy to think that endianness is not an issue.
>> 
>> It was a good exercise in debugging if the bridge was working though.
>> 
>> One actual issue is that we are testing long providers using the long to create 2
int values. Should we test using a series of the upper 32 bits and then a series of the lower
32 bits?
> 
> Is that useful since the test now sees the integers as they are produced (i.e. 2
> values per long)?
> 

It is not relevant if you are concerned about int quality. But if you are concerned about
long quality then it is relevant. The long quality is important for the quality of nextDouble().
Although in that case only the upper 53 bits of the long. This means that the quality of a
long from an int provider is also not covered by the benchmark as that would require testing
alternating ints twice using the series: 1, 3, 5…, 2n+1 and 2, 4, 6, … 2n.

Given that half of the int values were previously discarded from the BigCrush analysis, the
current results on the user guide page actually represent BigCrush running on the upper 32-bits
of the long, byte reversed due to the big/little endian interpretation of the bytes in Java
and linux. 

So maybe the an update to the RandomStressTester to support analysis for int or long quality
is needed. For now the quality section on the website should just state that the quality is
for the ‘nextInt()’ method of the RNG.

I have the results of BigCrush using the new bridge c program:

XorShiftSerialComposite : 40, 39, 39 : 608.2 +/- 3.9

So it fails.

The XorShiftXorComposite crashed after 2 hours about 1/4 of the results file complete. I am
running again so I can monitor it for memory usage. Something in the BigCrush suite just cannot
handle this generator output.

Alex


> Gilles
> 
>> I may set an unused workstation on this task to see what happens.
>> 
>> Alex
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org <mailto:dev-unsubscribe@commons.apache.org>
> For additional commands, e-mail: dev-help@commons.apache.org <mailto:dev-help@commons.apache.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message