commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Heger <>
Subject Re: [Math] How fast is fast enough?
Date Sat, 06 Feb 2016 21:09:31 GMT

Am 06.02.2016 um 12:31 schrieb James Carman:
> Okay, folks, this is definitely getting out of hand. Let's put a moratorium
> on this thread for the weekend or something and try to come back together
> next week and try to move forward. I would urge folks to watch this while
> we wait:
> p.s. Phil, I do hope you'll reconsider.

Big +1! Please take some time to rethink. This would be a huge loss for
this community.


> On Fri, Feb 5, 2016 at 10:47 PM Phil Steitz <> wrote:
>> OK, I give up.  I am withdrawing as volunteer chair or member of the
>> new TLP.
>> Phil
>> On 2/5/16 7:23 PM, Gilles wrote:
>>> Phil,
>>> You talk again about me trying to push forward changes that
>>> serve no purpose besides "trash performance and correctness".
>>> This is again baseless FUD to which I've already answered
>>> (with detailed list of facts which you chose to ignore).
>>> You declare anything for which you don't have an answer as
>>> "bogus argument". Why is the reference to multi-threaded
>>> implementations bogus?  You contradict yourself in pretending
>>> that CM RNGs could be so good as to make people want to use
>>> them while refusing to consider whether another design might
>>> be better suited to such high(er)-performance extensions.
>>> This particular case is a long shot but if any and all
>>> discussions are stopped dead, how do you imagine that we can
>>> go anywhere?
>>> As you could read from experts, micro-benchmarks are deceiving;
>>> but you refuse to even consider alternative designs if there
>>> might be a slight suspicion of degradation.
>>> How can we ever set up a constructive discussion on how to
>>> make everybody enjoy this project if the purported chair is
>>> so bent to protecting existing code rather than nurture a good
>>> relationship with developers who may sometimes have other ideas?
>>> I'm trying to improve the code (in a dimension which you can't
>>> seem to understand unfortunately) but respectfully request
>>> data points from those users of said code, in order to be
>>> able to prove that no harm will be done.
>>> But you seem to prefer to not disclose anything that would
>>> get us closer to agreement (better design with similar
>>> performance and room for improvement, to be discussed
>>> together as a real development team -- Not you requiring,
>>> as a bad boss, that I bow to your standards for judging
>>> usefulness).
>>> This 1% which you throw at me, where does it come from?
>>> What does 1% mean when the benchmark shows standard deviations
>>> that vary from 4 to 26% in the "nextInt" case and from 3 to
>>> 7% in the "nextGaussian" case?
>>> This 1% looks meaningless without context; context is what I'm
>>> asking in order to try and establish objectively whether
>>> another design will have a measurable impact on actual tasks.
>>> I'm not going to show any "damaged" benchmark because of how
>>> unwelcome you make me feel every time I wish to talk about
>>> other aspects of the code.
>>> There is no development community here.  Only solitary
>>> coders who share a repository.
>>> Not sorry for the top-post,
>>> Gilles
>>> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
>>>> On 2/5/16 12:59 PM, Gilles wrote:
>>>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>>>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>>>>> Hi.
>>>>>>> Here is a micro-benchmark report (performed with
>>>>>>> "PerfTestUtils"):
>>>>>>> -----
>>>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>>>>>>> time
>>>>>>> unit: ms)
>>>>>>>                         name time/call std dev total time ratio
>>>>>>> cv difference
>>>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>>>>> 0.26 0.0000e+00
>>>>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>>>>> 0.15 -1.2900e+02
>>>>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>>>>> 0.04 2.1032e+02
>>>>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>>>>> 0.14 5.1945e+02
>>>>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>>>>> 0.14 8.1451e+02
>>>>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>>>>> 0.06 9.7816e+02
>>>>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>>>>> 0.08 1.6602e+03
>>>>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>>>>> 0.14 1.7301e+03
>>>>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>>>>> 0.16 1.6139e+02
>>>>>>> -----
>>>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>>>> Questions are:
>>>>>>> * How meaningful are micro-benchmarks when the timed operation
>>>>>>> has
>>>>>>> a very
>>>>>>>   small duration (wrt e.g. the duration of other machine
>>>>>>> instructions that
>>>>>>>   are required to perform them)?
>>>>>> It is harder to get good benchmarks for shorter duration
>>>>>> activities,
>>>>>> but not impossible.  One thing that it would be good to do is to
>>>>>> compare these results with JMH [1].
>>>>> I was expecting insights based on the benchmark which I did run.
>>>> You asked whether or not benchmarks are meaningful when the task
>>>> being benchmarked is short duration.  I answered that question.
>>>>> We have a tool in CM; if it's wrong, we should remove it.
>>>>> How its results compare with JMH is an interesting question,
>>>> I will look into this.
>>>>> I
>>>>> agree, but I don't have time to make an analysis of benchmarking
>>>>> tools (on top of what I've been doing since December because
>>>>> totally innocuous changes in the RNG classes were frowned upon
>>>>> out of baseless fear).
>>>> Please cut the hypberbole.
>>>>>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>>>>> (absolute
>>>>>>>   duration) below which anything will be deemed good enough?
>>>>>> That depends completely on the application.
>>>>> Sorry, I thought that it was obvious: I don't speak of applications
>>>>> that don't care about performance. :-)
>>>>> For those that do, I do not agree with the statement: the question
>>>>> relates to finding a point below which it is the environment that
>>>>> overwhelms the other conditions.
>>>>> A point where there will be _unavoidable_ overhead (transferring
>>>>> data
>>>>> from/to memory, JVM book-keeping, ...) and perturbations (context
>>>>> switches, ...) such that their duration adds a constant time (on
>>>>> average) that may render most enhancements to an already efficient
>>>>> algorithm barely noticeable in practice.
>>>>> Similarly, but in the opposite direction, some language constructs
>>>>> or design choices might slow down things a bit, but without
>>>>> endangering any user.
>>>>> A problem arises when any enhancement to the design is deemed
>>>>> harmful because it degrades a micro-benchmark, even though that
>>>>> benchmark may not reflect any real use-cases.
>>>>> Then, the real harm is against development.
>>>>>>> * Can a library like CM admit a trade-off between ultimate
>>>>>>> performance and
>>>>>>>   good design?   IOW, is there an acceptable overhead in exchange
>>>>>>> for other qualities
>>>>>>>   (clarity, non-redundancy, extensibility, etc.)?
>>>>>> That is too general a question to be meaningful.   We need to look
>>>>>> at specific cases.  What exactly are you proposing?
>>>>> <rant>
>>>>> It is quite meaningful even if it refers to general principles.
>>>>> Those could (should, IMO) be taken into account when managing a
>>>>> project like CM, on a par with "performance" (whose intrinsic value
>>>>> is never questioned).
>>>>> </rant>
>>>> Rant all you want.  Vague generalities and hyperbole have no value.
>>>>> Two specific cases are:
>>>>> * inheritance vs delegation (a.k.a. composition)
>>>>> * generics (that could require runtime casts)
>>>> This is getting closer to meaningful.  Where exactly in the code are
>>>> you wanting to use something and seeing benchmark damage?
>>>>>>> * Does ultimate performance for the base functionality
>>>>>>> (generation
>>>>>>> of a
>>>>>>>   random number) trump any consideration of use-cases that would
>>>>>>> need an
>>>>>>>   extension (of the base functionality, such as computation to
>>>>>>> match another
>>>>>>>   distribution) that will unavoidably degrades the performance
>>>>>>> (hence the
>>>>>>>   micro-benchmark will be completely misleading for those users)?
>>>>>> Again, this is vague and the answer depends on what exactly you
>>>>>> are
>>>>>> talking about. Significantly damaging performance of PRNG
>>>>>> implementations is a bad idea,
>>>>> Now, *this* is vague: what do you mean by "significantly"?
>>>>> That was actually my question in the first place.
>>>> If you are talking about PRNG performance, I would say a 1% hit is
>>>> significant.
>>>>> Referring to the
>>>>> benchmark above, people who'd know why they require ultimate
>>>>> performance
>>>>> should be able to tell what range of numbers they'd find
>>>>> acceptable in
>>>>> that table.
>>>>> <rant>
>>>>> Actually my questions are very precise, but the answers would
>>>>> require
>>>>> some decent analysis, rather than the usual "bad idea" dismissal.
>>>>> </rant>
>>>>> In the Javadoc of the "random" package, there is information about
>>>>> performance but no reference as to the benchmarking procedure.
>>>> It would be great to repeat these using JMH, which is emerging as a
>>>> de facto standard for java benchmarking.  I will look into this.
>>>>> I can consistently observe a totally different behaviour (using
>>>>> "PerfTestUtils"):
>>>>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>>>>  2. moreover, the ratio *grows* with each of the longer periods
>>>>>     members of the WELL family (see the above table).
>>>>> This makes me wonder how someone who purports to need "ultimate"
>>>>> performance can have any objective basis to determine what is good
>>>>> or bad for his own applications.
>>>>>> unless there are actual practical use
>>>>>> cases you can point to that whatever changes you are proposing
>>>>>> enable.
>>>>> As I've explained in very much details in another thread, I've
>>>>> reviewed (from a design POV) the RNG code in "random" and IMHO,
>>>>> there
>>>>> is room for improvement (cf. above for what I mean by that term).
>>>>> <rant>
>>>>> I have some code ready for review but I had to resort to what I
>>>>> considered sub-optimal design (preemptively renouncing to propose a
>>>>> "delegation"-based design) solely because of the destructive
>>>>> community
>>>>> process that takes place here.[1]
>>>>> </rant>
>>>> More vague hyperbole that serves no purpose.  Please focus on actual
>>>> code or design issues.
>>>>> The practical use-cases is anything that needs further
>>>>> processing of
>>>>> the numbers produced according to a uniform distribution:
>>>> Isn't that what the samplers in the distributions package do?  What
>>>> we need from the PRNG implementations is just blocks of bits.  Since
>>>> we wanted a pluggable replacement for j.u.Random, we added uniform
>>>> ints, longs and floats and gaussian floats.  The samplers just need
>>>> uniform doubles.  The practical use case we need is well-supported
>>>> in the code we have.  What is missing, exactly?
>>>>> I agree that
>>>>> there would be little sense to code that latter part in a "pure" OO
>>>>> way[2].  And Luc made it indeed quite efficient, I think, in the
>>>>> various
>>>>> concrete classes.
>>>>> What I want to reconsider is how those concrete low-level
>>>>> algorithms can
>>>>> be plugged in a higher-level function that just requires a
>>>>> "source of
>>>>> randomness", as I'd call a provider of "int" (or "long") values,
>>>>> where
>>>>> the high level functionality does not care at all about the
>>>>> provider's
>>>>> inner working (a.o. how it's seeded!).
>>>> This is why many higher-level samplers and other things that require
>>>> random data inside [math] have a pluggable RandomGenerator.
>>>>> A case in point is the sampling of other distributions (namely the
>>>>> Normal distribution).
>>>> Or any of the others.  We have a default, inversion-based method
>>>> that the abstract distribution classes provide and some pretty good
>>>> specialized implementations within individual distributions.  Most
>>>> of these just require uniform random doubles as source.
>>>>> Here is the benchmark report:
>>>>> -----
>>>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>>>> time unit: ms)
>>>>>                         name time/call std dev total time ratio
>>>>> cv difference
>>>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>>>> 0.14 0.0000e+00
>>>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>>>> 0.07 1.2892e+04
>>>>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>>>> 0.06 1.0393e+04
>>>>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>>>> 0.07 1.1360e+04
>>>>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>>>> 0.04 1.1513e+04
>>>>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>>>> 0.03 1.2125e+04
>>>>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>>>> 0.06 1.1928e+04
>>>>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>>>> 0.04 1.3931e+04
>>>>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>>>> 0.06 1.4118e+04
>>>>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>>>> 0.08 1.1049e+04
>>>>> -----
>>>>> where the first line is JDK's "nextInt()" and the remaining are
>>>>> "nextGaussian()".
>>>>> The generation time is thus about 4-fold that of "nextInt()".
>>>>> Thus, degrading the performance of "nextInt()" by 10% would
>>>>> degrade the
>>>>> performance of "nextGaussian()" by half that.
>>>>> For a performance discussion to be meaningful, I think that we'd
>>>>> need
>>>>> to know how that fact would affect, even modestly, any moderately
>>>>> complex
>>>>> post-processing of the generated values.
>>>>> Another case, for modularity, would be to consider that other
>>>>> algorithms could
>>>>> be implemented to provide the required distribution.[3]
>>>>> In the current design (inheritance-based), that can only be done
>>>>> by creating
>>>>> a subclass, even though the core functionality ("nextDouble()") is
>>>>> not
>>>>> overridden.
>>>>>>> * What are usages of the CM RNGs?
>>>>>>>   Do those use-cases strictly forbid "loosing" a dozen
>>>>>>> milliseconds per
>>>>>>>   million calls?
>>>>>> There are many different use cases.  My own applications use
>>>>>> them in
>>>>>> simulations to generate random deviates, to generate random hex
>>>>>> strings as identifiers and in stochastic algorithms like some
>>>>>> of our
>>>>>> internal uses.  The last case is definitely sensitive to PRNG
>>>>>> performance.
>>>>> Thanks for giving examples, but since we talk about performance, I
>>>>> was hoping for some real flesh, like the relative duration of
>>>>> numbers
>>>>> generation (e.g. the total duration of calls to the
>>>>> "RandomGenerator"
>>>>> instances wrt to the total duration of the application).
>>>>> I don't know if by "last case", you are referring to code that is
>>>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>>>>> RNG (in the sense that generation would count as a sizable part of
>>>>> the whole processing).
>>>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>>>> As I pointed out many times: if an application is severely
>>>>> dependent
>>>>> on the performance of RNG, the user probably will turn to specific
>>>>> tools (e.g. GPUs? [4]) rather than use CM.
>>>> That is a bogus argument.  We should make our PRNGs simple and fast
>>>> so their use can extend to performance-sensitive applications.
>>>>> Conversely, using Java might be preferred for its flexibility,
>>>>> which
>>>>> is destroyed by a search for ultimate performance (which nobody
>>>>> seems
>>>>> able to define reasonably).
>>>>> Performance is not a goal in itself; it should not be a trophy
>>>>> which
>>>>> sits uselessly on a shelf.
>>>> Nor should "beautiful design" in the eyes of one person.
>>>>> My goal is not to deliberately slow things down; it is to allow
>>>>> some
>>>>> leeway so that designs which are deemed better (on all counts
>>>>> except,
>>>>> perhaps, performance) are given a chance to show their
>>>>> strengths, in
>>>>> particular in areas where performance in absolute terms is "good
>>>>> enough" for all use-cases which CM should care about (hence the
>>>>> need
>>>>> of actual data points[5]).
>>>> I see no reason that we can't have it both ways - good design and
>>>> good performance. What we have now, modulo maybe some small changes
>>>> to reduce code duplication, works fine.  If you want to play with
>>>> 64-bit generators and can find reference implementations and verify
>>>> that they do in fact perform better, great.  If not, I don't see the
>>>> point.  You can rant and complain all you want; but I am not going
>>>> to let us trash performance or correctness of code in the random
>>>> class or anywhere else just because you think it is somehow "better
>>>> designed"  unless you can show specific, practical use cases
>>>> demonstrating the value of the changes.
>>>> Phil
>>>>> Gilles
>>>>> [1] "Is it faster?"
>>>>>     "No."
>>>>>     "Then, no."
>>>>> [2] Although that is in some sense what you indirectly defend by
>>>>> wanting
>>>>>     to stick with a meaningless "next(int bits)" method.
>>>>> [3]
>>>>> [4]
>>>>> [5] Hence the need to agree on a methodology/policy for
>>>>> benchmarking.
>>>>>> Phil
>>>>>> [1]
>>>>>>>   IOW, would those users for which such a difference matters
>>>>>>> CM at all?
>>>>>>> Thanks,
>>>>>>> Gilles
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message