commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <alex.d.herb...@gmail.com>
Subject Re: [Rng] New XoShiRo generators
Date Thu, 21 Mar 2019 12:41:12 GMT

On 21/03/2019 02:08, Gilles Sadowski wrote:
> undefined
> Le mer. 20 mars 2019 à 21:39, Alex Herbert <alex.d.herbert@gmail.com> a écrit
:
>>
>>
>>> On 20 Mar 2019, at 18:22, Gilles Sadowski <gilleseran@gmail.com> wrote:
>>>
>>>>> [...]
>>>>> Perhaps the "RandomStressTester" should be made more flexible
>>>>> and, instead of a hard-coded "GeneratorsList", accept a list of "enum"
>>>>> identifiers. e.g. a list (ASCII file) like:
>>>>> MT 3
>>>>> MT_64 3
>>>>> SPLIT_MIX_64 3
>>>>> [etc.]
>>>>> where the second columns indicates how many runs to perform for
>>>>> this type of RNG.
>>>>> [A (mandatory) flag specifying this file would replace the first 2
>>>>> command-line arguments.]
> Correcting: The second argument (number of threads) is still needed.
>
>>>> That would be nice.
>>>>
>>>> But:
>>>>
>>>> 1. The GeneratorsList class allows a user to use another list that is on
>>>> the Java classpath and test their own generators provided by that list.
>>>> That is if I understand how Java can build things using reflection. The
>>>> likelihood of ever needing to do this is ... ? (very small)
>>> And they could still do that if they add to the "RandomSource" list…
>> Yes. So no need to worry about what others will do and just support named enums in
RandomSource.
>>
>>>> 2. The GeneratorsList class can be auto generated from the enum
>>> There could be a provided input file, updated whenever "RandomSource"
>>> is.
>>> One advantage is they we could slightly expand the format to allow e.g.:
>>>
>>> TWO_CMRES 3
>>> TWO_CMRES() 3
>>> TWO_CMRES(1,2) 3
>> I had wondered about supporting additional arguments to the RandomSource.create method.
Currently we only need to support Integer but support for Integer, Long, Float, Double can
come from using the canonical form, e.g. 0, 0L, 0f, 0.0. Perhaps just jump hurdles when needed.
An easier approach would be to hard code the handling of TWO_CMRES_SELECT to know the values
inside the parentheses are two ints.
>>
>>>> I suppose it could be updated to require an input text list (as per your
>>>> example) but if called with no arguments it can print out a default text
>>>> list using the order of the RandomSource enum.
>>>>
>>>> Thus it should not need to be modified if more generators are added.
>>> I'm not sure I understand here.
>> If the program only needs to support RandomSource then it can easily enumerate RandomSource
and print out an example input file. Then there is no need to provide an input file for the
program since it can write a default one.
> If the user forgets to supply one, the program outputs one, and stops;
> then the user reissues the command?

Yes:

 > java -jar examples-stress.jar -h

Print something helpful

 > java -jar examples-stress.jar --template

Print a template generators list to stdout

 > java -jar examples-stress.jar --template > list.txt

 > java -jar examples-stress.jar target/tu_ 4 list.txt BE 
./stdin2testu01 BigCrush


I've used picocli before. It definitely needs very little extra code due 
to the use of annotations.

One thing I do not know is what happens to the arguments for the stress 
test program, e.g.

/usr/bin/dieharder -a -g 200 -Y 1 -k 2

If they match anything used by the examples-stress.jar program then they 
will be consumed by a parser. If options match arguments to be passed to 
the stress test program then the executable program would have to be put 
into a script. For now we can choose the arguments to not clash. Should 
be simple given we avoid these:

./stdin2testu01 BigCrush

/usr/bin/dieharder -a -g 200 -Y 1 -k 2

So:

-h, --help => help

--template => print a template

I would leave these as mandatory as they are all important to not forget:

  * output file prefix
  * int threads
  * generators list
  * endianness (an enum of BE or LE)
  * application
  * application arguments

For picocli that would be:

@Parameters(index = "0")    File prefix;
@Parameters(index = "1")    int threadCount;
@Parameters(index = "2")    File generatorsList;
@Parameters(index = "3")    Endianness endianness;
@Parameters(index = "4")    File executable;
@Parameters(index = "5..*") String[] executableArguments;


So it is very simple. I will make modifications to the updated program 
to use Picocli.


For reference here are the results of BigCrush with:

The correct little-endian byte order:

XorShiftXorComposite : 54, 53, 53 : 646.8 +/- 10.9
XorShiftSerialComposite : 40, 39, 39 : 608.2 +/- 3.9
SplitXorComposite : 0, 0, 0 : 625.8 +/- 0.2

The incorrect big-endian byte order:

XorShiftXorComposite : 92, 89, 90 : 986.7 +/- 4.3
XorShiftSerialComposite : 75, 74, 76 : 632.0 +/- 2.3

(I did not run the control.)

This makes a fair bit of difference as it did for dieharder. So the byte 
order is important to get correct. I.e. you are not testing the true 
output of the generator if the bytes are reversed.


Alex



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message