commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [Rng] New XoShiRo generators
Date Tue, 19 Mar 2019 09:26:44 GMT


> On 19 Mar 2019, at 10:35, Gilles Sadowski <gilleseran@gmail.com> wrote:
> 
>>> [...]
>>>> So leave the testing to just ints and document on the user guide that is
>>>> what we are testing.
>>> 
>>> +1
>> 
>> OK. That seems simplest.
>> 
>> Given all the stress tests will be rerun shall I go ahead and reorder the existing
files, user guide .apt file and the GeneratorsList to be in the order of the RandomSource
enum?
> 
> We could wait for the new results before updating the site.

I was going to rearrange it all and test all the links in the local site are all ok. I have
this scripted but have not yet run it. When new results are ready they can be written over
the existing ones. Either way I am fine. So let’s leave it until new results have been done
and then check the site.

I will update the GeneratorsList to be autogenerated from the RandomSource enum.

> 
>> 
>> 
>> Big/Little Endian for Dieharder:
>> 
>> I’ve spent some time looking at the source code for Dieharder. It reads binary
file data using this (taken from libdieharder/rng_file_input_raw.c):
>> 
>> unsigned int iret;
>> // ...
>> fread(&iret,sizeof(uint),1,state->fp);
>> 
>> So it reads single unsigned integers using fread().
>> 
>> Given that it is possible to run die harder using numbers from ascii and binary input
files I set up a test. I created them using a RNG with the same seed with the standard output
from a DataOutputStream and the byte reversed output using Integer.reverseBytes. Here’s
what happens:
>> 
>>> dieharder -g 201 -d 0 -f raw.bin.rev
>>   diehard_birthdays|   0|       100|     100|0.89220858|  PASSED
>>> dieharder -g 202 -d 0 -f raw.txt
>>   diehard_birthdays|   0|       100|     100|0.89220858|  PASSED
>> 
>>> dieharder -g 201 -d 0 -f raw.bin
>>   diehard_birthdays|   0|       100|     100|0.30776452|  PASSED
>>> dieharder -g 202 -d 0 -f raw.txt.rev
>>   diehard_birthdays|   0|       100|     100|0.30776452|  PASSED
>> 
>>> cat raw.bin | dieharder -g 200 -d 0
>>   diehard_birthdays|   0|       100|     100|0.30776452|  PASSED
>> 
>> 
>> Note the reversed byte sequence (.rev suffix) is required to get the same results
from the binary (.bin) file as from the text (.txt) file.
>> 
>> So the binary read of Dieharder is using the little endian representation, as was
required for TestU01.
>> 
>> I had modified the stdin2testu01.c bridge to detect if the system was little endian
and then correct the input data by reversing the bytes. It may be a better idea to write a
test c program to detect the endianness of the system for reference. Then update the stress
test benchmark to have an argument for little or big endian output when piping the int data
to the command line program.
>> 
>> I think it is important to get the endianness of the data correct. At least for Dieharder
it runs tests using tuples of bits from the data which can span multiple bytes. For example
the sts_serial test (-d 102) uses overlapping n-tuples of bits with n from 1 to 16. Other
tests using non overlapping tuples such as rgb_bitdist (-d 200) use n 1 to 12.
>> 
>> Reversing the bytes in the Java code is the easiest option.
> 
> +1
> [With an option flag for selecting whether the output should be BE or LE.]
> 

OK. I will consolidate all this and update the stress_test.md instructions to make it clear
that endianness needs to be considered.

Should I add the raw data dumper to the source base? This runs a named RandomSource for a
given number of iterations with a provided seed and outputs 4 files: Dieharder text format
and raw binary, with standard order and byte reversed. It may be useful if debugging the output
of RNGs ever needs to be done again.

Alex



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message