commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [Math] Moving on or not?
Date Thu, 07 Feb 2013 16:32:46 GMT
On 2/7/13 8:04 AM, Gilles wrote:
> On Thu, 07 Feb 2013 07:01:42 -0800, Phil Steitz wrote:
>> On 2/7/13 4:58 AM, Gilles wrote:
>>> On Wed, 06 Feb 2013 09:46:55 -0800, Phil Steitz wrote:
>>>> On 2/6/13 9:03 AM, Gilles wrote:
>>>>> On Wed, 06 Feb 2013 07:19:47 -0800, Phil Steitz wrote:
>>>>>> On 2/5/13 6:08 AM, Gilles wrote:
>>>>>>> Hi.
>>>>>>> In the thread about "static import", Stephen noted that
>>>>>>> decisions
>>>>>>> on a
>>>>>>> component's evolution are dependent on whether the future of
>>>>>>> the
>>>>>>> Java
>>>>>>> language is taken into account, or not.
>>>>>>> A question on the same theme also arose after the
>>>>>>> presentation of
>>>>>>> Commons
>>>>>>> Math in FOSDEM 2013.
>>>>>>> If we assume that efficiency is among the important
>>>>>>> qualities for
>>>>>>> Commons
>>>>>>> Math, the future is to allow usage of the tools provided by the
>>>>>>> standard
>>>>>>> Java library in order to ease the development of multi-threaded
>>>>>>> algorithms.
>>>>>>> Maintaining Java 1.5 source compatibility for the reason
>>>>>>> that we
>>>>>>> may need
>>>>>>> to support legacy applications will turn out to be
>>>>>>> self-defeating:
>>>>>>> 1. New users will not consider Commons Math's features that are
>>>>>>> notably
>>>>>>>    apt to parallel processing.
>>>>>>> 2. Current users might at some point simply switch to another
>>>>>>> library if
>>>>>>>    it proves more efficient (because it actually uses
>>>>>>> multi-threading).
>>>>>>> 3. New Java developers will be turned away because they will
>>>>>>> want
>>>>>>> to use
>>>>>>>    the more convenient features of the language in order to
>>>>>>> provide
>>>>>>>    potential contributions.
>>>>>>> If maintaining 1.5 source compatibility is kept as a
>>>>>>> requirement, the
>>>>>>> consequence is that Commons Math will _become_ a legacy
>>>>>>> library.
>>>>>>> In that perspective, implementing/improving algorithms for
>>>>>>> which a
>>>>>>> parallel version is known to be more efficient is plainly a
>>>>>>> waste of
>>>>>>> development and maintenance time.
>>>>>>> In order to mitigate the risks (both of upgrading and of not
>>>>>>> upgrading
>>>>>>> the source compatibility requirement), I would propose to
>>>>>>> create a
>>>>>>> new
>>>>>>> project (say, "Commons Math MT") where we could implement new
>>>>>>> features[1]
>>>>>>> without being encumbered with the 1.5 requirement.[2]
>>>>>>> The "Commons Math MT" would depend on "Commons Math" where we
>>>>>>> would
>>>>>>> continue developing single-thread (and thread-safe) "tasks",
>>>>>>> i.e.
>>>>>>> independent units of processing that could be used in
>>>>>>> algorithms
>>>>>>> located in "Commons Math MT".
>>>>>>> In summary:
>>>>>>> - Commons Math (as usual):
>>>>>>>   * single-thread (sequential) algorithms,
>>>>>>>   * (pure) Java 5,
>>>>>>>   * no dependencies.
>>>>>>> - Commons Math MT:
>>>>>>>   * multi-thread (parallel) algorithms,
>>>>>>>   * Java 7 and beyond,
>>>>>>>   * JNI allowed,
>>>>>>>   * dependencies allowed (jCuda).
>>>>>>> What do you think?
>>>>>> There are several other possibilities to consider:
>>>>>> 0) Implement multithreading using JDK 1.5 primitives
>>>>>> 1) Set things up within [math] to support parallel execution in
>>>>>> JDK
>>>>>> 1.7, Hadoop or other frameworks
>>>>>> 2) Instead of a new project, start a 4.x branch targeting JDK
>>>>>> 1.7
>>>>>> I think we should maintain a version that has no dependencies
>>>>>> and no
>>>>>> JNI in any case.
>>>>>> Starting a branch and getting concrete about how to parallelize
>>>>>> some
>>>>>> algorithms would be a good way to start.  One thing I have not
>>>>>> really investigated and would be interested in details on is
>>>>>> what
>>>>>> you actually get in efficiency gain (or loss?) using fork /
>>>>>> join vs
>>>>>> just using 1.5+ concurrency for the kinds of problems we
>>>>>> would end
>>>>>> up using this stuff for.
>>>>>> Thinking about specific parallelization problem instances would
>>>>>> also
>>>>>> help decide whether 1) makes sense (i.e., whether it makes
>>>>>> sense as
>>>>>> you mention above to maintain a single-threaded library that
>>>>>> provides task execution for a multithreaded version or
>>>>>> multithreaded
>>>>>> frameworks).
>>>>>> One more thing to consider is that for at least some users of
>>>>>> [math], having the library internally spawn threads and/or peg
>>>>>> multiple processors may not be desirable.  It is a little
>>>>>> misleading
>>>>>> to say that multithreading is the way to get "efficiency." 
>>>>>> It is
>>>>>> really the way to *use* more compute resources and unless there
>>>>>> are
>>>>>> real algorithmic improvements, the overall efficiency may
>>>>>> actually
>>>>>> be less, due to task coordination overhead.  What you get is
>>>>>> faster
>>>>>> execution due to more greedy utilization of available cores.
>>>>>> Actual
>>>>>> efficiency (how much overall compute resource it takes to
>>>>>> complete a
>>>>>> job) partly depends on how efficiently the coordination
>>>>>> itself is
>>>>>> done (which JDK 1.7 claims to do very well - I have just not
>>>>>> seen
>>>>>> substantiation or any benchmarks demonstrating this) and how the
>>>>>> parallelization effects overall compute requirements.  In any
>>>>>> case,
>>>>>> for environments where library thread-spawning is not
>>>>>> desirable, I
>>>>>> think we should maintain a single-threaded version.
>>>>> Unless I missed the point, those reasons are exactly why I
>>>>> propose to
>>>>> have 2 projects/components. One, "Commons-Math", does not fiddle
>>>>> with
>>>>> resources, while the other would provide a "parallelizationLevel"
>>>>> setting for the algorithms written to possibly take advantage of
>>>>> the
>>>>> Java 5+ "task framework".
>>>> OK, what about the 4.x option?
>>>>> Yes, we could still be good by using only Java 5's concurrency
>>>>> features
>>>>> but the issue I raise is not only about concurrency but about
>>>>> evolution/progress/maintenance, all things that require raising
>>>>> interest
>>>>> from new contributors (unless it's fine that Commons Math be
>>>>> tagged as a
>>>>> "library of the past"...).
>>>> +1 for experimenting with parallelization.  I would just like to
>>>> understand if the JDK 7 stuff really adds much - in particular,
>>>> does
>>>> it handle coordination / cpu allocation better than you could
>>>> easily
>>>> do it with 1.5.  More supported JDKs == more potential users, so I
>>>> like to see a real reason to bump the JDK level.
>>>>> But using concurrency features in "Commons Math" would also
>>>>> contradict
>>>>> your own point ("we should maintain a single-threaded
>>>>> version"): I
>>>>> agree,
>>>>> and that's why I proposed this other project...
>>>>> As for efficiency (or faster execution, if you want), I don't
>>>>> see the
>>>>> point in doubting that tasks like global search (e.g. in a
>>>>> genetic
>>>>> algorithm) will complete in less time when run in parallel...
>>>>> As I summarized previously, having a "Commons Math MT" would
>>>>> bring no
>>>>> inconvenience, contrary to either your points 0, 1, or 2. [No
>>>>> inconvenience to me, that is, but to people with requirements
>>>>> like
>>>>> "Java 5 compatible" or "no multi-threading").
>>>>> As I indicated, the basic "task" could be defined in "Commons
>>>>> Math" and
>>>>> "Commons Math MT" would provide the parallelization "glue" (e.g.
>>>>> to divide
>>>>> the search space of the GA).
>>>> I think it is best at this point to cut a branch and actually
>>>> start
>>>> working on specific algorithms.  Having a set of candidate
>>>> algorithms for parallelization will help us decide what we
>>>> actually
>>>> need and how it might work.  I would personally favor the 4.x
>>>> approach, with thread-spawning behavior configurable.
>>> It seems fair to wait until parallel algorithms are actually
>>> implemented.
>>> However it is not clear what you mean with "the 4.x approach": if
>>> it is
>>> actually allowing Java 7, that would mean that, starting from 4.0,
>>> we'll
>>> indeed drop support of earlier JVMs!
>>> Why would this be preferred to having 2 projects? Of course, if
>>> everyone
>>> agrees to that move to Java 7, that's fine. :-)
>> What I meant was that instead of creating a new component, we would
>> just create a new release line.  Like what tomcat does for servlet
>> spec versions.  I guess this does mean that we end up having to
>> stabilize the 3.x APIs because no additional "major" release would
>> be allowed in that line.  That would be a *good thing* IMO as long
>> as we can do it cleanly.  If not, maybe we end up having to use 5.x
>> for the JDK 1.7+ version, using 4.0 to get to a stable API for the
>> current trunk code.
> There's a still the human resource problem: we don't have it to
> maintain
> a single branch; having two will only make it worse.

Yes, but the "new project" approach has the same problem. 
>>> On the other hand, if we keep Java 5, at least until we get use
>>> cases or
>>> contributions that would benefit from features in JDKs newer than
>>> 1.5,
>>> there is no need to create a branch; we can just go on with adding
>>> multi-thread codes to the trunk (to become part[1] of the upcoming
>>> 3.x
>>> releases).
>> That is why I wanted to get a feel for what the JDK 1.7 stuff really
>> buys you.   Has anyone seen benchmarks showing better performance
>> using 1.7 than can be obtained just using 1.5 concurrency
>> primitives?
> Again, there are separate issues:
>  1. Coding in Java 7
>  2. Running with the JVM shipped with JDK 1.7
> The newer JVMs are faster, independently of whether new features
> of the
> language are used.
> But it could well be that some of the new features allow even better
> performance (as is foreseen for Java 8).

Agreed.  I am interested in understanding better both how much
easier it actually is to code and whether the 1.7 framework
materially improves scheduling / allocation over what you could do
just using 1.5 primitives.
>> Has anyone used 1.7 to parallelize numerical algorithms
>> and found it really easier / more performant?
> Where are those people who could answer?

This is a public list :)
> That is one of the points I raised. If we maintain source
> compatibility
> with a language version that is 9 years old, not many contributors
> are
> going to be interested. Thus reducing the chance to get answers...
>> Any opinions /
>> responses to Konstantin's comment on where parallelization should be
>> implemented - i.e. in the library vs somewhere up the stack?
> What was the _question_?  ...

The question he implicitly raised was whether or not it makes sense
for a low-level library to parallelize tasks / run across cores. 
This is a legitimate question.  It may be better actually to set
things up so that higher-level frameworks or applications can
arrange parallel execution rather than embedding it in the low-level
library itself.  This is also what I was referring to when I said
that in some contexts, thread-spawning / cpu hogging may not be
>>  Any
>> ideas how to set things up so that [math] code can play nicely with
>> concurrency frameworks?
> That's a strange question in the context of a project that tries hard
> not to have any dependency.

I did not mean necessarily to bring in dependencies; but rather to
make it easy for computational tasks executed by [math] code to be
managed by external concurrency frameworks, e.g. Hadoop.

> If the requirement is to only depend on the standard JDK: the
> framework
> is in
>  java.util.concurrent
> and all we need to do is to define "tasks" that can be "submitted to
> an executor:
> Regards,
> Gilles
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message