commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex D Herbert (JIRA)" <>
Subject [jira] [Commented] (RNG-88) Update the GenerationPerformance benchmark
Date Thu, 04 Apr 2019 11:06:00 GMT


Alex D Herbert commented on RNG-88:

I have created new JMH benchmarks for all the methods defined in {{UniformRandomProvider}}.

A baseline implementation has been created for each method and scales linearly with workload:


(Note: The {{nextBytes()}} method is plotted on the second Y-axis as the measurement time
is very different.)

Each method from {{UniformRandomProvider}} for each random source is then benchmarked against
a baseline implementation. This separates out the benchmarks into separate classes that share
the baseline implementation. E.g. {{NextFloatGenerationPerformance}} for {{nextFloat}} but
{{NextIntGenerationPerformance}} for {{nextInt()}} and {{nextInt(int)}}.

An example benchmark looks like this:
 * Executes benchmark to compare the speed of generation of random numbers from the
 * various source providers for {@link UniformRandomProvider#nextDouble()}.
public class NextDoubleGenerationPerformance extends AbstractBenchmark {
     * The benchmark state (retrieve the various "RandomSource"s).
    public static class Sources extends BaselineSources {
        protected UniformRandomProvider createBaseline() {
            return BaselineUtils.getNextDouble();

    /** The value. */
    private double value;

     * Baseline for a JMH method call with no return value.
    public void baselineVoid() {
        // Do nothing, this is a baseline

     * Baseline for a JMH method call returning a {@code double}.
     * @return the value
    public double baselineDouble() {
        return value;

     * Exercise the {@link UniformRandomProvider#nextDouble()} method.
     * @param sources Source of randomness.
     * @return the double
    public double nextDouble(Sources sources) {
        return sources.getGenerator().nextDouble();
The {{BaselineSources}} declares all RNG providers plus a keyword {{BASELINE}}. When it identifies
this then it calls the method to create the baseline. Otherwise it just creates the RandomSource
using the standard factory method.

I found stability on my test machine using 10 iterations for warm-up and measurement of 1
second each.
Intel(R) Xeon(R) CPU E5-1680 v3 @ 3.20GHz
openjdk version "1.8.0_191"
Linux 4.4.0-145-generic #171-Ubuntu SMP Tue Mar 26 12:43:40 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux
I ran all the benchmarks, subtracted the BASELINE from the runtime metric then expressed the
results relative to the JDK. I found the median of 10 runs to be more stable than the average
as there can be a few outliers that skew the score. Here are the relative median scores:


This should be compared to [RNG User Guide|].
Those results show that the fastest generators are SPLIT_MIX, XOR_SHIFT_1024_S, TWO_CMRES,
MWC_256 and slowest are JDK and WELL_44497x. This chart shows the same results but the differences
are larger.

For reference the results of the SplitMix in the User guide (v1.2) verses the new results:
|nextBytes|0.2179846927| |
|nextLongN|0.1803571971| |
|nextFloat|0.2567586134| |
|nextIntN|0.3662874911| |

In the case of the SplitMix the JMH overhead for time measurement for {{nextLong()}} is 63%
of the total time. So by subtracting the baseline the differences between a fast generator
and a slow one are more pronounced.

I also note the strangely fast speed of {{nextBoolean()}} for the JDK implementation, where
it is fastest despite not being a fast generator of {{int}} values. The implementation for
this method is standard for an {{IntProvider}}. Store a single {{int}} value and then use
each consecutive bit of the 32-bits for a {{boolean}}. So the speed up must be related to
the relative speed of testing the bits from the JDK source verses the others, and we know
from the BigCrush results that the JDK bit source is not good.

I've opened [PR 35|] with the new code.

This idea can be applied to other benchmarks currently using a loop to test performance that
do not require input data/state change on each loop iteration. These are:
 * GeometricSamplersPerformance
 * SamplersPerformance
 * NextGaussianPerformance

In the case of SamplersPerformance it may be better to separate the DiscreteSampler and ContinuousSampler
into two benchmarks that can have a baseline implementation of the appropriate sampler interface.

I can do this in a new PR, the same PR or directly within master.

The following benchmarks currently exploit the loop used in the benchmark method and cannot
be modified:
 * ThreadLocalPerformance
 * PoissonSamplerCachePerformance

> Update the GenerationPerformance benchmark
> ------------------------------------------
>                 Key: RNG-88
>                 URL:
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Assignee: Alex D Herbert
>            Priority: Minor
>         Attachments: baseline.jpg, next.png
>          Time Spent: 10m
>  Remaining Estimate: 0h
> The current GenerationPerformance benchmark runs all the generators to collect timing
data. However the act of running the test within JMH takes some time (test overhead). This
overhead should be measured and subtracted from the timing data to create a time attributed
only to the method exercised in the RNG.
> This can be done by creating a dummy RNG that returns a fixed value. The implementation
must be done in a manner where the JIT compiler is not able to remove the dummy method from
the execution path. This is achieved by returning state variables from the dummy RNG (not
final variables).
> Testing can be done using a variable number of iterations and the run-times assessed
for linearity. If the run time scale with the number of iterations then the JIT compiler has
not removed it from execution. The dummy RNG then serves as a baseline for comparison of true
> This idea with examples is shown in [RNG-87|]
which tested a variant of the MWC_256 algorithm.

This message was sent by Atlassian JIRA

View raw message