commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject [RNG] About RNG-6 (Was: [VOTE][RC1] Release Commons Rng 1.0)
Date Thu, 15 Sep 2016 14:37:10 GMT
Hi.

Moving to another thread, as your comments do not belong in
a release vote.

Many of them could lead to interesting discussions that should
(and could) have happened earlier.

Indeed, for the record, see
   https://issues.apache.org/jira/browse/RNG-6
and the numerous posts which I sent to this list
  * while ironing out the code of this new component in its own
    repository, and doing the various chores (site, userguide,
    quality testing, Github, Travis, Coveralls, etc) for the last
    6 weeks,
  * while developing that code in a new Commons Math package
    "o.a.c.math4.rng" for the last 6 months,
  * while fixing the code in Commons Math "o.a.c.m.random"
    package for the last 9 months.

For a number of issues which you now raise, I've explicitly
asked for input.

I find it a bit strange that, at this precise point, you start
to question the choices which I've made in the absence of any
prior remarks, suggestions, or warnings.

On Wed, 14 Sep 2016 17:04:53 +0200, Emmanuel Bourg wrote:
> Hi all,
>
> RNG is moving fast, that's nice. I got a quick look at the API and 
> the
> site, here are my comments:
>
> - The main page (Overview) could be enhanced with a description of 
> the
> API longer than a single sentence. It could mention the motivations 
> for
> the API, the implementations supported, a sample code snippet and a 
> link
> to the user guide.

I don't understand "motivations for the API".

Do you mean something like my comment in
   https://issues.apache.org/jira/browse/RNG-6
?

[My motivation for proposing this code can be gathered from
the archives and the bugs and shortcomings reported in the
JIRA MATH project, and related posts here). They do not
belong in that "overview" page.]

> - In the Performance section of the User Guide, I'd suggest grouping 
> the
> 3 tables into a single one. It might also be interesting to put the
> quality results in the same table, so we can see the trade-off 
> between
> performance and quality.

I won't do that.
It was already a pain to create those tables as they are (see the
doc source).

> - The License section of the user guide could be removed, I guess
> everyone knows that Apache projects use the Apache license.

No problem.
I thought that it was mandatory.

All the pages were copied from the corresponding Commons Math ones.

> - The internal packages should be hidden from the Javadoc,

I do not know how do that.

And I don't see the point, since from the "binary" POV, they
are "public".

> otherwise
> users may be tempted to directly use the classes there.

You must have seen that there are prominent warnings against
doing that.

Users will do what they want anyways.
What is not supported is not supported.  We agreed here that it
was enough to tag a class as "internal" in order to fulfill our
obligation of least surprise.

> This would imply
> copying the documentation of the generators into the RandomSource 
> class.

I do not agree.

Users can select a generator without knowing the gory details.
If they want to know more, they go to the internals.
It they really want to know, they go to the references.

Internal should not be hidden: contributors who'd fancy to provide
additional generators do not have to go through any hoops.

Or do you suggest that this component should supply two JARs:
   * commons-rng-api
      Contents of "o.a.c.rng"
   * commons-rng-core
      Contents of "o.a.c.rng.internal"
?

And two sets of Javadoc?

> - Why isn't the UniformRandomProvider interface simply named
> RandomProvider?

Maybe because we want to be transparent on what the code does (?).

With this name, there is no ambiguity: no more, no less than what
the name implies.

> Is there any plan to implement non uniform random
> generators in the future? Will they have a different interface?

This point has been discussed on this list (although, again, there
was a severe lack of feedback...).

Executive summary:
   No strong coupling between generators (of uniform sequence of
   pseudo-random numbers) and utilities on top of those generators
   (e.g. generators producing non-uniform sequences).
   Utilities should go in another component/package/module (see
   e.g. Commons Math's packages "o.a.c.math4.distribution" and
   "o.a.c.math4.random" for candidate code).

> - UniformRandomProvider mimics the java.util.Random methods which is
> nice. Is there any value in also implementing the nextGaussian() 
> method
> such that our API is a perfect drop-in replacement for the JDK class?

No. [IMHO.]
This was discussed on this list (and cf. above point).

> - For code relying on the java.util.Random type, it might be useful 
> to
> provide an adapter turning an UniformRandomProvider into a
> java.util.Random implementation (Hipparchus has one).

This was discussed on this list.

And see Commons Math's classes "o.a.c.math4.random.RandomUtils" and
"o.a.c.math4.random.RngAdaptor" classes.

In the latter (being deprecated as of its creation), I indicated why
it is a bad idea.

Again, if we really want to support the use-case you proposed, it's
a utility that should go in another component.

> - The choice of an enum for the factory class RandomSource is a bit
> unusual. It is probably ok for simple RNG with a simple seed, but 
> with
> more complex implementations I find it less intuitive to use.

Less intuitive than what?

> For example the create(RandomSource, Object Object...) method has 
> this
> snippet in its documentation:
>
>   RandomSource.create(RandomSource.TWO_CMRES_SELECT, 26219, 6, 9)
>
> When a user types this in its IDE, there is no type information nor
> contextual documentation for the parameters expected. The user has to
> know the type of the seed and the meaning of the parameters expected,
> the IDE won't be able to help here (with CTRL+P and CTRL+Q in 
> IntelliJ).

Honestly, I don't care.
To put it less bluntly, the choice of RNG to instantiate is not a
matter of IDE.

Practically, the lack of type information (which I don't like either)
was a trade-off to have a dead-simple API.

As a user, one doesn't need an intelligent IDE to know all there is
to know about Commons RNG, and it will take less than 10 minutes!

> This design induces some complexity, the loss of the seed type leads 
> to
> the seed conversion mechanism (most of the util package if I 
> understand
> well).
>
> I'd feel more comfortable with a more classic factory design, 
> something
> like:
>
>   RandomSource.createTwoCmres()
>   RandomSource.createTwoCmres(Integer seed)
>   RandomSource.createTwoCmres(Integer seed, int i, int j)
>
> or a shorter form like:
>
>   RNG.newTwoCmres()
>   RNG.newTwoCmres(int seed)
>   RNG.newTwoCmres(Integer seed, int i, int j)
>
> That will add more methods to the factory, but it'll be much more 
> usable
> in my opinion.

Not in mine.

This design came after 9 months of sustained work.
Nothing to be proud of in terms of LOC/day...
But what you propose to drop is the only non trivial part of
this work.

As I said above, this is a trade-off, and it looks like a fault
only because Java is strongly typed.

The automatic seed conversion is a huge advantage for casual
users; they can still seed them with a single "long" (and under
the hood get a good "native" seed).
But the doc is there to warn them that RNGs usually have a larger
state.

> - I'm not convinced by the need for the convenience static methods
> RandomSource.createInt/Long. If the user doesn't want to specify the
> seed, he could simply use the seed-less create method. And if the 
> random
> generator requires extra parameters, he could use a null value for 
> the
> seed as a way to mean "generate one for me please":
>
>   RandomSource.create(RandomSource.TWO_CMRES_SELECT, null, 6, 9)

These methods take care of a common use-case, where the actual
seed does not matter (hence the caller might like to not have to
code similar methods), but the _same_ seed must be used in another
execution of the same application to ensure reproducibility (e.g.
for bug tracking).

See also:
   
https://docs.oracle.com/javase/7/docs/api/java/security/SecureRandom.html#generateSeed%28int%29

> - RandomSource.saveState/restoreState allows one to convert a random
> generator to/from a byte array. This looks a lot like Java
> serialization. Maybe the implementations supporting this feature 
> could
> be marked as Serializable, thus it would be possible to call
> RandomSource.saveState only if the instance is Serializable instead 
> of
> catching the UnsupportedOperationException thrown when the feature 
> isn't
> supported.

No, no, no.
Avoiding the "Serializable" clutter was a _requirement_.

Serialization is not part of this component, but is supported by all
implementations.

It's a feature that users can _choose_ the persistence framework
they want and not be forced to consider the pitfalls of "Serializable".
[See "Effective Java".]

It's also a feature that this component will never have to deal
with CVE related to exploitation of something "Serializable" that
should not have been...

> - Why is the byte[] state encapsulated into the RandomSource.State
> class? Why not using a byte array directly?

Because encapsulation is the hallmark of good OO programming?

If some application only needs to save/restore without actual
persistence, it can do without dealing with the "ugliness" of
"byte[]".
And it will continue to work even if the internal representation
is changed.

Exposing the internals was unavoidable for the serialization
use-cases.

>
> So far I'm +1 for a beta release aimed at getting more feedback on 
> the
> API,

There were a lot of discussions here about what "beta release"
means and how to implement it in practice (package naming,
versioning scheme).

They never led anywhere.

Snapshot JARs have been available since the Jenkins task has been
up and running.

I've asked for feedback on this list, on the site since it's been
up, and on "helpwanted.apache.org".

I seriously doubt that having code tagged "beta" would be more
successful at getting users to provide feedback.

It will more likely turn them away, thinking that the generators
might be buggy.  Which is not the case.

As much as I advocated it in order to have more timely releases
of Commons Math, I think that in this case, a "beta" release
(if such a thing existed in Commons) is nothing short of
counter-productive.

The API is what it is.  As for any other component; and, quite
frankly, in proportion of the lines of code, much more thought
was put into it than for anything in Commons Math, and probably
many other Commons components).

I do not deny that it might be wrong (or I wouldn't have filed
"RNG-6") but at this point it is usage that should demonstrate
it.  Not doubt based on... I don't know what.

> but I'm -1 for releasing the current code as 1.0.

Any technical reasons?


Gilles
[Truly sorry that this discussion did not happen at the proper
time. And hoping that constructive feedback like this one will
come when I start talking about "commons-rng-tools".]


>
> Emmanuel Bourg
>
>
> Le 11/09/2016 à 16:55, Gilles a écrit :
>> Hi.
>>
>> This is a [VOTE] for releasing Apache Commons Rng 1.0 (from RC1).
>>
>> Tag name:
>>   RNG_1_0_RC1 (signature can be checked from git using 'git tag -v')
>>
>> Tag URL:
>>
>> 
>> https://git-wip-us.apache.org/repos/asf?p=commons-rng.git;a=commit;h=f8d23082607b9f2c7be7f489eb09627722440ee5
>>
>>
>> Commit ID the tag points at:
>>   f8d23082607b9f2c7be7f489eb09627722440ee5
>>
>> Site:
>>   http://home.apache.org/~erans/commons-rng-1.0-RC1-site
>>
>> Distribution files:
>>   https://dist.apache.org/repos/dist/dev/commons/rng/
>>
>> Distribution files hashes (SHA1):
>>   a221e862c8ff970a9ca3e7fbd86c3200d1f8780a 
>> commons-rng-1.0-bin.tar.gz
>>   689b2bfbdb1856d4f47851d75762aab42057805a commons-rng-1.0-bin.zip
>>   40b7b1639eedf91b5fad5d38e6ebec01e659048f 
>> commons-rng-1.0-src.tar.gz
>>   6296dbabde10169d6365bda99f2af6dcc191e515 commons-rng-1.0-src.zip
>>
>> KEYS file to check signatures:
>>   http://www.apache.org/dist/commons/KEYS
>>
>> Maven artifacts:
>>
>> 
>> https://repository.apache.org/content/repositories/orgapachecommons-1199/org/apache/commons/commons-rng/1.0/
>>
>>
>> [ ] +1 Release it.
>> [ ] +0 Go ahead; I don't care.
>> [ ] -0 There are a few minor glitches: ...
>> [ ] -1 No, do not release it because ...
>>
>> This vote will close in 72 hours, at 2016-09-14T15:10:00Z (this is 
>> UTC
>> time).
>> ----------
>>
>> Thanks,
>> Gilles
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message