commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <>
Subject Re: [Math] Commons Math (r)evolution
Date Thu, 09 Jun 2016 21:12:40 GMT
Hello Jörg.

On Thu, 09 Jun 2016 09:43:06 +0200, Jörg Schaible wrote:
> Hi Gilles,
> Gilles wrote:
>> Hi.
>> On Wed, 8 Jun 2016 23:50:00 +0300, Artem Barger wrote:
>>> On Wed, Jun 8, 2016 at 12:25 AM, Gilles
>>> <>
>>> wrote:
>>>> According to JIRA, among 180 issues currently targeted for the
>>>>>> next major release (v4.0), 139 have been resolved (75 of which
>>>>>> were not in v3.6.1).
>>>>> ​Huh, it's above of 75% completion :)​
>>>> Everybody is welcome to review the "open" issues and comment
>>>> about them.
>>> ​I guess someone need to prioritize them​ according to they
>>> importance for
>>> release.
>> Importance is relative... :-}
>> IMO, it is important to not release unsupported code.
> Unit test *are* kind of support.

Unit tests are not what I mean by "support".  They only increase the
probability that the code behaves as expected. [And sometimes they do
not because they can be buggy too, as I discovered when refactoring
the "random" package.]

But anyways, my reservations have nothing to do with the functionality
of released code: users who are satisfied with the service provided by
v3.6.1 (or any of the previous versions of CM) have no reason to 
to 4.0.  [By upgrading, all they get is the obligation to change the
"import" statements.]

And we have no reason to release a v4.0 of a code that
  1. has not changed
  2. is not supported

>> So the priority would be higher for issues that would be included
>> in the release of the new Commons components.
>> Hence the need to figure out what these components will be.
>>>>>> Of course, anyone who wishes to maintain some of these codes
>>>>>> (answer user questions, fix bugs, create enhancements, etc.)
>>>>>> is most welcome to step forward.
>>>>> ​I can try to cover some of these and maintain relevant code
>>>>> parts.​
>>>> Which ones?
>>> ​I will look into JIRA and provide the issue numbers, and of course 
>>> I
>>> can cover and assist with ML part and particular clustering.​
>> Thanks.
>>>> IMO, a maintainer is someone who is able to respond to user
>>>> questions and to figure out whether a bug report is valid.
>>> ​I'm subscribed for mailing list for quite a while and haven't
>>> seen a lot of questions coming from user​s.
>> The "user" ML has always been fairly quiet.
>> Does it mean that the code is really easy to use?
>> Or feature-complete (I doubt that)?
>> Or that there are very few users for the most complex features?
>> The "dev" ML was usually (much) more active.
>> The point is that when someone asks a question or propose an
>> contribution, there must be someone to answer.
> And this is IMHO a wrong assumption. We have a lot of components 
> where the
> original authors have left long ago. So the situation is not new.

Having no support is bad (IMO).
[It doesn't have to be from the original authors of course.]

> Math is a specialized library and nobody expects that it is 
> accompanied by
> tutorials explaining the theory or developers that act as trainers 
> here on
> the lists. Users of special algorithms are supposed to be experts 
> themselves
> and should understand what they are doing. Or do you expect that any
> arbitrary user can use genetic algorithms or neuronal network stuff 
> without
> the mathematical background?

No, I do not expect that.
[Although it is sometimes part of the resolution of a bug report, and
something that gives a sense of "you are welcome here".]

The main point is about real bugs that won't be handled (see below).

> Anything is well and can be released as long as the existing code is
> verified by unit tests. Otherwise we would have to remove a lot of 
> code
> every time we release a component ... or do you expect e.g. that the 
> release
> manager of vfs understands completely any of its providers?

No, certainly not, since I could RM CM. ;-)

But that's not the point!

_Some_ developer(s) should be able to support whatever is in 
Otherwise how can it be deemed "in development"?

Just today, two issues were reported on JIRA:

They, unfortunately, illustrate my point.

Moreover what could be true for VFS is not for CM where there are many,
many different areas that have nothing in common (except perhaps some
ubiquitous very-low utilities which might be worth their own component
to serve as a, maybe "internal", dependency).

Also, compare the source basic statistics (lines of code):
               VFS      CM
Java code    24215   90834
Unit tests    8926   95595

All in all, CM is more than 5 times larger than VFS (not even counting

>>>>>> ​I think that clustering part could be generalized to ML package
>>>>>> as a
>>>>> whole.​
>>>> Fine I guess, since currently the "neuralnet" sub-package's only
>>>> concrete functionality is also a clustering method.
>>> ​I was also wondering whenever ML package meant to be extended in
>>> the future
>> Really there was no plan, or as many plans as there were 
>> developers...
>> Putting all these codes (with different designs, different coding
>> practices, different intended audiences, different levels of 
>> expertise,
>> etc.) in a single library was not sustainable.
>> That's why I strongly favour cutting this monolith into pieces
>> with a limited scope.
> Nobody objects, but if you look at vfs, it is still *one* Apache 
> Commons
> component, just with multiple artifacts. All these artifacts are 
> released
> *together*.

Sorry I'm lost, I looked there:

And, it seems that all the functionality is in a single JAR.
[Other files contain the sources, tests, examples.]

Anyways, it is obvious that, in VFS, there is a well defined scope
(a unifying rationale).

No such thing in CM.

What I want to achieve is indeed to create a set of components that are
more like VFS!

This is particularly obvious with the RNGs where there is one unifying
interface, a factory method and multiple implementations.
[Of course, in that case, the new component will be much simpler than
VFS (which is a "good thing", isn't it?).]

> Turning math into a multi-project has nothing to do with your
> plans to drop mature code,

I am not dropping anything (others did that); I am stating facts and I
now want to spend my time on something (hopefully) worth it.  [Working
to modularize unsupported code is a (huge) waste of time.]

Also, in the case of CM, "mature code" is meaningless as an overall
qualifier: some codes are
  * new (and never released, e.g. 64-bits-based RNGs)
  * algorithms introduced relatively recently (and perhaps never used)
  * old (and sometimes outdated and impossible to fix without breaking
  * mostly functional (but impossible to maintain, cf. MATH-1375)
  * resulting from a refactoring (hence even when the functionality has
    existed for a long time, the code is not "mature")

IMHO, maturity should be visible in the code.  It's an impression that
builds up by looking at the code as a whole, and coming to the 
that indeed there is some overall consistency across files and 

Within some CM packages: yes (even if "mature" would certainly not mean
free of sometimes serious problems).

Across the whole library: certainly *not*.
[For reasons I could expand on.  But I did several times (cf. archives)
without succeeding in changing course.]

> because you (and currently no-one else) cannot
> answer questions to its functionality.

See the first post in this thread, in the part about gradually 
codes if and when they are supported by a new team.


> Cheers,
> Jörg

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message