commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [Math] How fast is fast enough?
Date Fri, 05 Feb 2016 19:59:24 GMT
On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
> On 2/4/16 3:59 PM, Gilles wrote:
>> Hi.
>>
>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
>> -----
>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
>> unit: ms)
>>                         name time/call std dev total time ratio
>> cv difference
>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>> 0.26 0.0000e+00
>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>> 0.15 -1.2900e+02
>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>> 0.04 2.1032e+02
>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>> 0.14 5.1945e+02
>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>> 0.14 8.1451e+02
>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>> 0.06 9.7816e+02
>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>> 0.08 1.6602e+03
>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>> 0.14 1.7301e+03
>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>> 0.16 1.6139e+02
>> -----
>> where "cv" is the ratio of the 3rd to the 2nd column.
>>
>> Questions are:
>> * How meaningful are micro-benchmarks when the timed operation has
>> a very
>>   small duration (wrt e.g. the duration of other machine
>> instructions that
>>   are required to perform them)?
>
> It is harder to get good benchmarks for shorter duration activities,
> but not impossible.  One thing that it would be good to do is to
> compare these results with JMH [1].

I was expecting insights based on the benchmark which I did run.

We have a tool in CM; if it's wrong, we should remove it.
How its results compare with JMH is an interesting question, I
agree, but I don't have time to make an analysis of benchmarking
tools (on top of what I've been doing since December because
totally innocuous changes in the RNG classes were frowned upon
out of baseless fear).

>> * In a given environment (HW, OS, JVM), is there a lower limit
>> (absolute
>>   duration) below which anything will be deemed good enough?
>
> That depends completely on the application.

Sorry, I thought that it was obvious: I don't speak of applications
that don't care about performance. :-)

For those that do, I do not agree with the statement: the question
relates to finding a point below which it is the environment that
overwhelms the other conditions.
A point where there will be _unavoidable_ overhead (transferring data
from/to memory, JVM book-keeping, ...) and perturbations (context
switches, ...) such that their duration adds a constant time (on
average) that may render most enhancements to an already efficient
algorithm barely noticeable in practice.
Similarly, but in the opposite direction, some language constructs
or design choices might slow down things a bit, but without
endangering any user.

A problem arises when any enhancement to the design is deemed
harmful because it degrades a micro-benchmark, even though that
benchmark may not reflect any real use-cases.
Then, the real harm is against development.

>> * Can a library like CM admit a trade-off between ultimate
>> performance and
>>   good design?   IOW, is there an acceptable overhead in exchange
>> for other qualities
>>   (clarity, non-redundancy, extensibility, etc.)?
>
> That is too general a question to be meaningful.   We need to look
> at specific cases.  What exactly are you proposing?

<rant>
It is quite meaningful even if it refers to general principles.
Those could (should, IMO) be taken into account when managing a
project like CM, on a par with "performance" (whose intrinsic value
is never questioned).
</rant>

Two specific cases are:
* inheritance vs delegation (a.k.a. composition)
* generics (that could require runtime casts)

>> * Does ultimate performance for the base functionality (generation
>> of a
>>   random number) trump any consideration of use-cases that would
>> need an
>>   extension (of the base functionality, such as computation to
>> match another
>>   distribution) that will unavoidably degrades the performance
>> (hence the
>>   micro-benchmark will be completely misleading for those users)?
>
> Again, this is vague and the answer depends on what exactly you are
> talking about. Significantly damaging performance of PRNG
> implementations is a bad idea,

Now, *this* is vague: what do you mean by "significantly"?
That was actually my question in the first place.  Referring to the
benchmark above, people who'd know why they require ultimate 
performance
should be able to tell what range of numbers they'd find acceptable in
that table.

<rant>
Actually my questions are very precise, but the answers would require
some decent analysis, rather than the usual "bad idea" dismissal.
</rant>

In the Javadoc of the "random" package, there is information about
performance but no reference as to the benchmarking procedure.

I can consistently observe a totally different behaviour (using
"PerfTestUtils"):
  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
  2. moreover, the ratio *grows* with each of the longer periods
     members of the WELL family (see the above table).

This makes me wonder how someone who purports to need "ultimate"
performance can have any objective basis to determine what is good
or bad for his own applications.

> unless there are actual practical use
> cases you can point to that whatever changes you are proposing 
> enable.

As I've explained in very much details in another thread, I've
reviewed (from a design POV) the RNG code in "random" and IMHO, there
is room for improvement (cf. above for what I mean by that term).
<rant>
I have some code ready for review but I had to resort to what I
considered sub-optimal design (preemptively renouncing to propose a
"delegation"-based design) solely because of the destructive community
process that takes place here.[1]
</rant>

The practical use-cases is anything that needs further processing of
the numbers produced according to a uniform distribution: I agree that
there would be little sense to code that latter part in a "pure" OO
way[2].  And Luc made it indeed quite efficient, I think, in the 
various
concrete classes.
What I want to reconsider is how those concrete low-level algorithms 
can
be plugged in a higher-level function that just requires a "source of
randomness", as I'd call a provider of "int" (or "long") values, where
the high level functionality does not care at all about the provider's
inner working (a.o. how it's seeded!).

A case in point is the sampling of other distributions (namely the
Normal distribution).
Here is the benchmark report:
-----
nextGaussian() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
                         name time/call std dev total time ratio   cv 
difference
o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000 0.14 
0.0000e+00
o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371 0.07 
1.2892e+04
    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330 0.06 
1.0393e+04
           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733 0.07 
1.1360e+04
          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797 0.04 
1.1513e+04
         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052 0.03 
1.2125e+04
         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970 0.06 
1.1928e+04
         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804 0.04 
1.3931e+04
         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882 0.06 
1.4118e+04
        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603 0.08 
1.1049e+04
-----
where the first line is JDK's "nextInt()" and the remaining are 
"nextGaussian()".

The generation time is thus about 4-fold that of "nextInt()".
Thus, degrading the performance of "nextInt()" by 10% would degrade the
performance of "nextGaussian()" by half that.

For a performance discussion to be meaningful, I think that we'd need
to know how that fact would affect, even modestly, any moderately 
complex
post-processing of the generated values.

Another case, for modularity, would be to consider that other 
algorithms could
be implemented to provide the required distribution.[3]
In the current design (inheritance-based), that can only be done by 
creating
a subclass, even though the core functionality ("nextDouble()") is not
overridden.

>> * What are usages of the CM RNGs?
>>   Do those use-cases strictly forbid "loosing" a dozen
>> milliseconds per
>>   million calls?
>
> There are many different use cases.  My own applications use them in
> simulations to generate random deviates, to generate random hex
> strings as identifiers and in stochastic algorithms like some of our
> internal uses.  The last case is definitely sensitive to PRNG
> performance.

Thanks for giving examples, but since we talk about performance, I
was hoping for some real flesh, like the relative duration of numbers
generation (e.g. the total duration of calls to the "RandomGenerator"
instances wrt to the total duration of the application).

I don't know if by "last case", you are referring to code that is
inside CM.  I didn't spot anything that makes "heavy" usage of a
RNG (in the sense that generation would count as a sizable part of
the whole processing).

As I pointed out many times: if an application is severely dependent
on the performance of RNG, the user probably will turn to specific
tools (e.g. GPUs? [4]) rather than use CM.

Conversely, using Java might be preferred for its flexibility, which
is destroyed by a search for ultimate performance (which nobody seems
able to define reasonably).
Performance is not a goal in itself; it should not be a trophy which
sits uselessly on a shelf.

My goal is not to deliberately slow things down; it is to allow some
leeway so that designs which are deemed better (on all counts except,
perhaps, performance) are given a chance to show their strengths, in
particular in areas where performance in absolute terms is "good
enough" for all use-cases which CM should care about (hence the need
of actual data points[5]).


Gilles

[1] "Is it faster?"
     "No."
     "Then, no."
[2] Although that is in some sense what you indirectly defend by 
wanting
     to stick with a meaningless "next(int bits)" method.
[3] http://www.doornik.com/research/ziggurat.pdf
[4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
[5] Hence the need to agree on a methodology/policy for benchmarking.

>
> Phil
>
> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>   IOW, would those users for which such a difference matters use
>> CM at all?
>
>>
>>
>> Thanks,
>> Gilles



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message