commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Herbert <>
Subject Re: [Rng] New XoShiRo generators
Date Wed, 20 Mar 2019 18:03:48 GMT

On 20/03/2019 16:13, Gilles Sadowski wrote:
> Le mer. 20 mars 2019 à 15:04, Alex Herbert <> a écrit
>> On 20/03/2019 13:08, Gilles Sadowski wrote:
>>>> - Perhaps there is another place for them?
>>>> I am happy to not clutter the source repo and keep them somewhere for
>>>> devs. But where? You tell me. Maybe in the examples-stress module they
>>>> are OK as that one is not officially part of the library.
>>> Unless a script is robust and readily usable by a newbie (i.e. anyone
>>> but the author), there is the risk that it becomes cruft.
>>> What you did is great work but I doubt that many would be interested
>>> in detailed tables of the failures, once one can easily compare the number
>>> of failures.
>>> If more is needed, there is no shortcut to reading the doc of the test suites
>>> themselves and other resources on the web...
>>> So, to summarize, what I think is interesting is to make it easy to rerun
>>> the test suites and update the site (e.g. for the "current" JDK):
>>>    * a document that mentions the external tools requirement
>>>    * a standalone application for running "RandomStressTester"
>>>    * a script that collects results from the above run, formats them into the
>>>      "quality" table ready to be pasted into the "rng.apt" file, and copies the
>>>      output files to their appropriate place (*not* assuming a git repository)
>>> It would also be great to be able to easily start the many benchmarks
>>> and similarly collects the results into the user guide tables.
>> I will write a script that will collect the results from the benchmark,
>> build the apt table and copy the output files to a directory.
>> Note that the benchmark files only contain the RNG class name:
>> # RNG: org.apache.commons.rng.core.source32.JDKRandom
>> This means the script has to be kept up to date with the corresponding
>> enum name, and the desired output order for the apt table. I can add
>> these to a simple array as the top which any newbie should be able to
>> extend.
>> It would be desirable for the script to search for results files in a
>> set of directories, identify them as dieharder (dh) or TestU01 (tu)
>> results and then output the files for each run N of a unique generator M
>> to the existing directory structure, e.g:
>>   > output dir1 dir2 ...
>> output/(dh|tu)/run_N/(dh|tu)_M
>> This should be done robustly such that the script can be pointed at the
>> git source tree for the site and it will figure out that the files are
>> already in the correct place. I think this would just involve sorting
>> the paths to all the results numerically not alphabetically (e.g. dh_1 <
>> dh_2 < dh_10) before processing them to output.
> Perhaps the "RandomStressTester" should be made more flexible
> and, instead of a hard-coded "GeneratorsList", accept a list of "enum"
> identifiers. e.g. a list (ASCII file) like:
> MT 3
> MT_64 3
> SPLIT_MIX_64 3
> [etc.]
> where the second columns indicates how many runs to perform for
> this type of RNG.
> [A (mandatory) flag specifying this file would replace the first 2
> command-line arguments.]

That would be nice.


1. The GeneratorsList class allows a user to use another list that is on 
the Java classpath and test their own generators provided by that list. 
That is if I understand how Java can build things using reflection. The 
likelihood of ever needing to do this is ... ? (very small)

2. The GeneratorsList class can be auto generated from the enum

I suppose it could be updated to require an input text list (as per your 
example) but if called with no arguments it can print out a default text 
list using the order of the RandomSource enum.

Thus it should not need to be modified if more generators are added.

>> For the JMH benchmarks I tend to run e.g.
>>   > mvn test -P benchmark -Dbenchmark=NextDoublePerformance
>> I then have a script to parse the JSON in
>> target/jmh-result.NextDoublePerformance.json to a Jira format or CSV
>> format table. Usually I add some relative score columns in a spreadsheet
>> then paste into Jira (which nicely accepts the tables).
>> It should not be too hard to generate the existing apt tables for
>> performance. I can look into this. I am assuming that the tables are
>> based on the highest 'number of samples' from each generator.
> I also have a (Perl) script, that uses a JSON library, to parse the
> JMH output; but it's quite ugly and not robust (IIRC, I need to make a
> change in the code depending on what should serve as the "reference"
> to be mapped to "1", in the table).
> Regards,
> Gilles
So this should be standardised somehow. One to think about.
>>>>>> [...]
>>>>>> You then have a utility for dumping output of any random source to
>>>>>> in a variety of formats.
>>>>>> Although long output is not needed for the test suites it is useful
>>>>>> native long generators.
>>>>>> WDYT?
>>>>> Looks good!
>>>> OK. I will work on the raw data dumper as a Jira ticket. It is
>>>> encapsulated work that does not really effect anything else.
>>>> DieHarder has finished!
>>>> I think my stupidity is what caused previous crashes. I was running the
>>>> stress test within the source tree and possibly git checkout onto
>>>> another branch makes some of the directory paths stale killing any
>>>> processes linked to those paths. I'll not do that again.
>>> Hence, the "standalone" application is the right choice it seems.
>>>> FYI: Here are the old results with incorrect byte order:
>>>> XorShiftSerialComposite : 24, 25, 23 : 134.1 +/- 16.1
>>>> XorShiftXorComposite : 88, 105, 89 : 396.2 +/- 9.9
>>>> SplitXorComposite : 0, 0, 0 : 90.8 +/- 21.9
>>>> Here are the new results with correct byte order:
>>>> XorShiftSerialComposite : 13, 15, 10 : 105.5 +/- 1.8
>>>> XorShiftXorComposite : 57, 57, 57 : 102.9 +/- 1.5
>>>> SplitXorComposite : 0, 0, 0 : 99.9 +/- 3.2
>>>> So interestingly passing the correct byte order lowers the number of
>>>> failures. There are still lots.
>>>> And BigCrush (with the fix for passing the correct byte order):
>>>> XorShiftSerialComposite : 40, 39, 39 : 608.2 +/- 3.9
>>>> XorShiftXorComposite : 54, 53, 53 : 646.8 +/- 10.9
>>>> SplitXorComposite : 0, 0, 0 : 625.8 +/- 0.2
>>> Curious to know whether it is also affected by the byte ordering.
>> I'll re-run BigCrush with the wrong byte ordering when I have updated
>> the stress test code. I finished it yesterday but will look it over with
>> fresh eyes before committing it.
>> Alex
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message