commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [Math] How fast is fast enough?
Date Sat, 06 Feb 2016 03:26:02 GMT
On Sat, 06 Feb 2016 02:07:05 +0000, James Carman wrote:
> Passion is a good thing. It means he gives a damn. Gilles obviously 
> cares
> quite a bit about the subject matter. I think it's great that he's 
> willing
> to crack open some code that a lot of folks wouldn't touch and 
> propose some
> interesting changes.

This way of viewing it makes all the difference!
Thanks.  I want to stay here... ;-)

[Actually, it also would not have occurred to me to go and look
under hood, until I had the truly bad idea of trying to quickly
get a user request into the code base...]

> Perhaps adding the new implementations alongside the
> existing ones would make sense? Maybe in a "beta" package? That way 
> we can
> get this stuff in front of folks and let them take it for a spin. 
> Maybe we
> create another artifact that we release with less strenuous rules 
> around
> backward compatibility? Just some thoughts.

I proposed as much in this message:
   http://markmail.org/message/nx2uuwlrvcun4cyq
as a consequence of abundant (destructive) criticism.


Gilles

>
> On Fri, Feb 5, 2016 at 8:54 PM Niall Pemberton 
> <niall.pemberton@gmail.com>
> wrote:
>
>> Are you not concerned about forming a TLP of 7 around Math when one 
>> of the
>> seven is clearly not a happy camper?
>>
>> Niall
>>
>> On Sat, Feb 6, 2016 at 12:07 AM, Phil Steitz <phil.steitz@gmail.com>
>> wrote:
>>
>> > On 2/5/16 12:59 PM, Gilles wrote:
>> > > On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>> > >> On 2/4/16 3:59 PM, Gilles wrote:
>> > >>> Hi.
>> > >>>
>> > >>> Here is a micro-benchmark report (performed with 
>> "PerfTestUtils"):
>> > >>> -----
>> > >>> nextInt() (calls per timed block: 2000000, timed blocks: 100, 
>> time
>> > >>> unit: ms)
>> > >>>                         name time/call std dev total time 
>> ratio
>> > >>> cv difference
>> > >>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 
>> 1.000
>> > >>> 0.26 0.0000e+00
>> > >>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 
>> 0.941
>> > >>> 0.15 -1.2900e+02
>> > >>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 
>> 1.097
>> > >>> 0.04 2.1032e+02
>> > >>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 
>> 1.239
>> > >>> 0.14 5.1945e+02
>> > >>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 
>> 1.374
>> > >>> 0.14 8.1451e+02
>> > >>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 
>> 1.450
>> > >>> 0.06 9.7816e+02
>> > >>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 
>> 1.763
>> > >>> 0.08 1.6602e+03
>> > >>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 
>> 1.795
>> > >>> 0.14 1.7301e+03
>> > >>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 
>> 1.074
>> > >>> 0.16 1.6139e+02
>> > >>> -----
>> > >>> where "cv" is the ratio of the 3rd to the 2nd column.
>> > >>>
>> > >>> Questions are:
>> > >>> * How meaningful are micro-benchmarks when the timed operation

>> has
>> > >>> a very
>> > >>>   small duration (wrt e.g. the duration of other machine
>> > >>> instructions that
>> > >>>   are required to perform them)?
>> > >>
>> > >> It is harder to get good benchmarks for shorter duration 
>> activities,
>> > >> but not impossible.  One thing that it would be good to do is 
>> to
>> > >> compare these results with JMH [1].
>> > >
>> > > I was expecting insights based on the benchmark which I did run.
>> >
>> > You asked whether or not benchmarks are meaningful when the task
>> > being benchmarked is short duration.  I answered that question.
>> > >
>> > > We have a tool in CM; if it's wrong, we should remove it.
>> > > How its results compare with JMH is an interesting question,
>> >
>> > I will look into this.
>> > > I
>> > > agree, but I don't have time to make an analysis of benchmarking
>> > > tools (on top of what I've been doing since December because
>> > > totally innocuous changes in the RNG classes were frowned upon
>> > > out of baseless fear).
>> >
>> > Please cut the hypberbole.
>> > >
>> > >>> * In a given environment (HW, OS, JVM), is there a lower limit
>> > >>> (absolute
>> > >>>   duration) below which anything will be deemed good enough?
>> > >>
>> > >> That depends completely on the application.
>> > >
>> > > Sorry, I thought that it was obvious: I don't speak of 
>> applications
>> > > that don't care about performance. :-)
>> > >
>> > > For those that do, I do not agree with the statement: the 
>> question
>> > > relates to finding a point below which it is the environment 
>> that
>> > > overwhelms the other conditions.
>> > > A point where there will be _unavoidable_ overhead (transferring 
>> data
>> > > from/to memory, JVM book-keeping, ...) and perturbations 
>> (context
>> > > switches, ...) such that their duration adds a constant time (on
>> > > average) that may render most enhancements to an already 
>> efficient
>> > > algorithm barely noticeable in practice.
>> > > Similarly, but in the opposite direction, some language 
>> constructs
>> > > or design choices might slow down things a bit, but without
>> > > endangering any user.
>> > >
>> > > A problem arises when any enhancement to the design is deemed
>> > > harmful because it degrades a micro-benchmark, even though that
>> > > benchmark may not reflect any real use-cases.
>> > > Then, the real harm is against development.
>> > >
>> > >>> * Can a library like CM admit a trade-off between ultimate
>> > >>> performance and
>> > >>>   good design?   IOW, is there an acceptable overhead in 
>> exchange
>> > >>> for other qualities
>> > >>>   (clarity, non-redundancy, extensibility, etc.)?
>> > >>
>> > >> That is too general a question to be meaningful.   We need to 
>> look
>> > >> at specific cases.  What exactly are you proposing?
>> > >
>> > > <rant>
>> > > It is quite meaningful even if it refers to general principles.
>> > > Those could (should, IMO) be taken into account when managing a
>> > > project like CM, on a par with "performance" (whose intrinsic 
>> value
>> > > is never questioned).
>> > > </rant>
>> >
>> > Rant all you want.  Vague generalities and hyperbole have no 
>> value.
>> > >
>> > > Two specific cases are:
>> > > * inheritance vs delegation (a.k.a. composition)
>> > > * generics (that could require runtime casts)
>> >
>> > This is getting closer to meaningful.  Where exactly in the code 
>> are
>> > you wanting to use something and seeing benchmark damage?
>> > >
>> > >>> * Does ultimate performance for the base functionality 
>> (generation
>> > >>> of a
>> > >>>   random number) trump any consideration of use-cases that 
>> would
>> > >>> need an
>> > >>>   extension (of the base functionality, such as computation to
>> > >>> match another
>> > >>>   distribution) that will unavoidably degrades the performance
>> > >>> (hence the
>> > >>>   micro-benchmark will be completely misleading for those 
>> users)?
>> > >>
>> > >> Again, this is vague and the answer depends on what exactly you 
>> are
>> > >> talking about. Significantly damaging performance of PRNG
>> > >> implementations is a bad idea,
>> > >
>> > > Now, *this* is vague: what do you mean by "significantly"?
>> > > That was actually my question in the first place.
>> > If you are talking about PRNG performance, I would say a 1% hit is
>> > significant.
>> > > Referring to the
>> > > benchmark above, people who'd know why they require ultimate
>> > > performance
>> > > should be able to tell what range of numbers they'd find
>> > > acceptable in
>> > > that table.
>> > >
>> > > <rant>
>> > > Actually my questions are very precise, but the answers would 
>> require
>> > > some decent analysis, rather than the usual "bad idea" 
>> dismissal.
>> > > </rant>
>> > >
>> > > In the Javadoc of the "random" package, there is information 
>> about
>> > > performance but no reference as to the benchmarking procedure.
>> >
>> > It would be great to repeat these using JMH, which is emerging as 
>> a
>> > de facto standard for java benchmarking.  I will look into this.
>> > >
>> > > I can consistently observe a totally different behaviour (using
>> > > "PerfTestUtils"):
>> > >  1. "MersenneTwister" is *always* faster than all of the WELL 
>> RNGs;
>> > >  2. moreover, the ratio *grows* with each of the longer periods
>> > >     members of the WELL family (see the above table).
>> > >
>> > > This makes me wonder how someone who purports to need "ultimate"
>> > > performance can have any objective basis to determine what is 
>> good
>> > > or bad for his own applications.
>> > >
>> > >> unless there are actual practical use
>> > >> cases you can point to that whatever changes you are proposing
>> > >> enable.
>> > >
>> > > As I've explained in very much details in another thread, I've
>> > > reviewed (from a design POV) the RNG code in "random" and IMHO, 
>> there
>> > > is room for improvement (cf. above for what I mean by that 
>> term).
>> > > <rant>
>> > > I have some code ready for review but I had to resort to what I
>> > > considered sub-optimal design (preemptively renouncing to 
>> propose a
>> > > "delegation"-based design) solely because of the destructive
>> > > community
>> > > process that takes place here.[1]
>> > > </rant>
>> >
>> > More vague hyperbole that serves no purpose.  Please focus on 
>> actual
>> > code or design issues.
>> > >
>> > > The practical use-cases is anything that needs further 
>> processing of
>> > > the numbers produced according to a uniform distribution:
>> >
>> > Isn't that what the samplers in the distributions package do?  
>> What
>> > we need from the PRNG implementations is just blocks of bits.  
>> Since
>> > we wanted a pluggable replacement for j.u.Random, we added uniform
>> > ints, longs and floats and gaussian floats.  The samplers just 
>> need
>> > uniform doubles.  The practical use case we need is well-supported
>> > in the code we have.  What is missing, exactly?
>> > > I agree that
>> > > there would be little sense to code that latter part in a "pure" 
>> OO
>> > > way[2].  And Luc made it indeed quite efficient, I think, in the
>> > > various
>> > > concrete classes.
>> > > What I want to reconsider is how those concrete low-level
>> > > algorithms can
>> > > be plugged in a higher-level function that just requires a 
>> "source of
>> > > randomness", as I'd call a provider of "int" (or "long") values,
>> > > where
>> > > the high level functionality does not care at all about the
>> > > provider's
>> > > inner working (a.o. how it's seeded!).
>> >
>> > This is why many higher-level samplers and other things that 
>> require
>> > random data inside [math] have a pluggable RandomGenerator.
>> > >
>> > > A case in point is the sampling of other distributions (namely 
>> the
>> > > Normal distribution).
>> >
>> > Or any of the others.  We have a default, inversion-based method
>> > that the abstract distribution classes provide and some pretty 
>> good
>> > specialized implementations within individual distributions.  Most
>> > of these just require uniform random doubles as source.
>> >
>> > >
>> > > Here is the benchmark report:
>> > > -----
>> > > nextGaussian() (calls per timed block: 2000000, timed blocks: 
>> 100,
>> > > time unit: ms)
>> > >                         name time/call std dev total time ratio
>> > > cv difference
>> > > o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>> > > 0.14 0.0000e+00
>> > > o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>> > > 0.07 1.2892e+04
>> > >    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>> > > 0.06 1.0393e+04
>> > >           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>> > > 0.07 1.1360e+04
>> > >          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>> > > 0.04 1.1513e+04
>> > >         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>> > > 0.03 1.2125e+04
>> > >         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>> > > 0.06 1.1928e+04
>> > >         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>> > > 0.04 1.3931e+04
>> > >         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>> > > 0.06 1.4118e+04
>> > >        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>> > > 0.08 1.1049e+04
>> > > -----
>> > > where the first line is JDK's "nextInt()" and the remaining are
>> > > "nextGaussian()".
>> > >
>> > > The generation time is thus about 4-fold that of "nextInt()".
>> > > Thus, degrading the performance of "nextInt()" by 10% would
>> > > degrade the
>> > > performance of "nextGaussian()" by half that.
>> > >
>> > > For a performance discussion to be meaningful, I think that we'd 
>> need
>> > > to know how that fact would affect, even modestly, any 
>> moderately
>> > > complex
>> > > post-processing of the generated values.
>> > >
>> > > Another case, for modularity, would be to consider that other
>> > > algorithms could
>> > > be implemented to provide the required distribution.[3]
>> > > In the current design (inheritance-based), that can only be done
>> > > by creating
>> > > a subclass, even though the core functionality ("nextDouble()") 
>> is
>> > > not
>> > > overridden.
>> > >
>> > >>> * What are usages of the CM RNGs?
>> > >>>   Do those use-cases strictly forbid "loosing" a dozen
>> > >>> milliseconds per
>> > >>>   million calls?
>> > >>
>> > >> There are many different use cases.  My own applications use 
>> them in
>> > >> simulations to generate random deviates, to generate random hex
>> > >> strings as identifiers and in stochastic algorithms like some 
>> of our
>> > >> internal uses.  The last case is definitely sensitive to PRNG
>> > >> performance.
>> > >
>> > > Thanks for giving examples, but since we talk about performance, 
>> I
>> > > was hoping for some real flesh, like the relative duration of 
>> numbers
>> > > generation (e.g. the total duration of calls to the 
>> "RandomGenerator"
>> > > instances wrt to the total duration of the application).
>> > >
>> > > I don't know if by "last case", you are referring to code that 
>> is
>> > > inside CM.  I didn't spot anything that makes "heavy" usage of a
>> > > RNG (in the sense that generation would count as a sizable part 
>> of
>> > > the whole processing).
>> > monteCarloP in KolmogorovSmirnovTest is one to check.
>> > >
>> > > As I pointed out many times: if an application is severely 
>> dependent
>> > > on the performance of RNG, the user probably will turn to 
>> specific
>> > > tools (e.g. GPUs? [4]) rather than use CM.
>> >
>> > That is a bogus argument.  We should make our PRNGs simple and 
>> fast
>> > so their use can extend to performance-sensitive applications.
>> > >
>> > > Conversely, using Java might be preferred for its flexibility, 
>> which
>> > > is destroyed by a search for ultimate performance (which nobody 
>> seems
>> > > able to define reasonably).
>> > > Performance is not a goal in itself; it should not be a trophy 
>> which
>> > > sits uselessly on a shelf.
>> >
>> > Nor should "beautiful design" in the eyes of one person.
>> > >
>> > > My goal is not to deliberately slow things down; it is to allow 
>> some
>> > > leeway so that designs which are deemed better (on all counts 
>> except,
>> > > perhaps, performance) are given a chance to show their 
>> strengths, in
>> > > particular in areas where performance in absolute terms is "good
>> > > enough" for all use-cases which CM should care about (hence the 
>> need
>> > > of actual data points[5]).
>> >
>> > I see no reason that we can't have it both ways - good design and
>> > good performance. What we have now, modulo maybe some small 
>> changes
>> > to reduce code duplication, works fine.  If you want to play with
>> > 64-bit generators and can find reference implementations and 
>> verify
>> > that they do in fact perform better, great.  If not, I don't see 
>> the
>> > point.  You can rant and complain all you want; but I am not going
>> > to let us trash performance or correctness of code in the random
>> > class or anywhere else just because you think it is somehow 
>> "better
>> > designed"  unless you can show specific, practical use cases
>> > demonstrating the value of the changes.
>> >
>> > Phil
>> > >
>> > >
>> > > Gilles
>> > >
>> > > [1] "Is it faster?"
>> > >     "No."
>> > >     "Then, no."
>> > > [2] Although that is in some sense what you indirectly defend by
>> > > wanting
>> > >     to stick with a meaningless "next(int bits)" method.
>> > > [3] http://www.doornik.com/research/ziggurat.pdf
>> > > [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>> > > [5] Hence the need to agree on a methodology/policy for 
>> benchmarking.
>> > >
>> > >>
>> > >> Phil
>> > >>
>> > >> [1] http://openjdk.java.net/projects/code-tools/jmh/
>> > >>>   IOW, would those users for which such a difference matters 
>> use
>> > >>> CM at all?
>> > >>
>> > >>>
>> > >>>
>> > >>> Thanks,
>> > >>> Gilles
>> > >
>> > >
>> > >
>> > > 
>> ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> > > For additional commands, e-mail: dev-help@commons.apache.org
>> > >
>> > >
>> >
>> >
>> >
>> > 
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> > For additional commands, e-mail: dev-help@commons.apache.org
>> >
>> >
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message