commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [Rng] New XoShiRo generators
Date Tue, 19 Mar 2019 13:19:28 GMT

On 19/03/2019 11:38, Gilles Sadowski wrote:
> Le mar. 19 mars 2019 à 10:26, Alex Herbert <alex.d.herbert@gmail.com> a écrit
:
>>
>>
>>> On 19 Mar 2019, at 10:35, Gilles Sadowski <gilleseran@gmail.com> wrote:
>>>
>>>>> [...]
>>>> Given all the stress tests will be rerun shall I go ahead and reorder the
existing files, user guide .apt file and the GeneratorsList to be in the order of the RandomSource
enum?
>>> We could wait for the new results before updating the site.
>> I was going to rearrange it all and test all the links in the local site are all
ok. I have this scripted but have not yet run it.
> Are you going to upload this script to the repository?

I wasn't going to. I've put it into my branch here:

https://github.com/aherbert/commons-rng/blob/userguide-rename/rename.pl

It generates two files that should do all the rearrangement. It's a work 
in progress. I've just tried it out and it seems to work, although I've 
not looked at the generated site.

The 'git mv' command when viewed using 'git log -M --summary' shows the 
renames, e.g.

commit c8b8903c00ab6d2c1403667048f27d9cbad4de46
Author: aherbert <aherbert@apache.org>
Date:   Tue Mar 19 12:14:13 2019 +0000

     Updated stress test results files

  rename src/site/resources/txt/userguide/stress/dh/run_1/{dh_K => 
dh_10} (100%)
  rename src/site/resources/txt/userguide/stress/dh/run_1/{dh_L => 
dh_11} (100%)
  rename src/site/resources/txt/userguide/stress/dh/run_1/{dh_M => 
dh_12} (100%)
  rename src/site/resources/txt/userguide/stress/dh/run_1/{dh_J => 
dh_13} (100%)
  rename src/site/resources/txt/userguide/stress/dh/run_1/{dh_C => dh_2} 
(100%)


However it is probably easiest to leave it as is and have the source 
repo results files out of sync with the GeneratorsList until the next 
benchmark results are done.


>
>> When new results are ready they can be written over the existing ones. Either way
I am fine. So let’s leave it until new results have been done and then check the site.
>>
>> I will update the GeneratorsList to be autogenerated from the RandomSource enum.
> Thanks.
> Let me know when everything is in place, and I'll try and start a stress test
> run on my side.

OK.

I am currently rerunning the dieharder test for the XorShift1024Star 
composites since that requires a little endian format on my machine. So 
far there are not as many failures when the byte order is reversed.

Once that is done I think we can wrap this up by:

- updating the stress test to support little/big endian format as input 
for the test suite

- updating the stress test GeneratorsList to match the RandomSource enum 
order

- merging the modified XorShift1024StarPhi generator

- deprecating the XOR_SHIFT_1024_S enum in favour of XOR_SHIFT_1024_S_PHI

- merging the new XorShiRo generators

Then it should be ready for a new stress test benchmark.
>>>>
>>>> Big/Little Endian for Dieharder:
>>>>
>>>> [...]
>>>>
>>>> Reversing the bytes in the Java code is the easiest option.
>>> +1
>>> [With an option flag for selecting whether the output should be BE or LE.]
>>>
>> OK. I will consolidate all this and update the stress_test.md instructions to make
it clear that endianness needs to be considered.
>>
>> Should I add the raw data dumper to the source base? This runs a named RandomSource
for a given number of iterations with a provided seed and outputs 4 files: Dieharder text
format and raw binary, with standard order and byte reversed. It may be useful if debugging
the output of RNGs ever needs to be done again.
> Sure.  Can this be also added as an option to the "RandomStressTester"
> class?  E.g. with a flag like
>    --dump file_prefix,sequence_length
> where
>    "file_prefix" is the basename of the output files, and
>    "sequence_length" is the number of ints to generate.

The RandomStressTester uses a list of generators. I built the 
RawDataDumper to run using a maven profile where it works for a single 
named RandomSource. Arguments are:

RandomSource name, long seed, sequence length, file prefix.

So all the functionality is there. If the file is included in the shaded 
jar you should be able to do:

 > java -cp examples-stress.jar 
org.apache.commons.rng.examples.stress.RawDataDumper SPLIT_MIX_64 123L 
1000 splitmix.out

Instead of running in a maven profile it could be built into a shaded 
package to allow:

 > java -jar raw-data-dumper.jar SPLIT_MIX_64 123L 1000 splitmix.out


I think the functionality is better as two programs to avoid doing too 
much in the RandomStressTester. Also I do not see the need to dump 
output from a list of 15+ generators.

I can add flag arguments to be used to specify which file to write:

.dh (text output using the dieharder format (uses unsigned int))

.txt (text output)

.raw (raw binary output)

.bin (2s complement binary output)

.hex (text output using hex)


And options for:

- int output

- long output

- byte reversed

- bit reversed

You then have a utility for dumping output of any random source to file 
in a variety of formats.

Although long output is not needed for the test suites it is useful for 
native long generators.

WDYT?

Alex


>
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message