commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] JSR 247: Data Mining 2.0
Date Tue, 03 Jan 2006 05:07:23 GMT
On 1/2/06, John Gant <> wrote:
> Reviewed the specification, and can say that it seems to contain some
> nice algorithms. I think [math] could add some very important
> methodologies for time series analysis (for instance smoothing
> algorithms, AR, MA, ARMA (if desired), and other decomposition
> methodologies). Phil, how can [math] contribute to this specification?

I am still studying the spec, so can't yet comment fully, but in
general, I can see two ways for us to get involved:

1. Contribute to the spec itself - i.e., give feedback on the
structure and content of the API
2. Implement portions of the spec or provide wrappers for [math]
components that provide some of the functionality described by the

The comment period for the "Early Draft Review" closes 11 Jan, so if
we want to get involved in 1., we should start that ASAP.   My only
general comment so far is that because the actors targeted by the spec
appear to be essentially "datamining vendors" and "API users" there is
not as much mix-and-match pluggability in the API as we might like to
see in [math] - i.e., "vendors" like us who want to provide
pluggability at multiple levels may not have the flexibility that we
would like.  This is just based on a very preliminary review, however,
and I may change my mind about this when I have worked more with the
API and more fully digested the spec.

> Noticed that the distance measures (within clustering algorithms) are
> pluggable but didn't see a list of distance measures in this spec,
> should [math] create or contribute to this list?

This is a good example illustrating how we should be thinking about
the spec.  The first question to ask is is the API sufficient to
provide all of the implementation flexibility that the various
clustering algorithms are going to need?  We discussed this same topic
a while back.  Assuming the answer is "yes" then no feedback is
necessary (for that part of the spec) and we can plow ahead creating
some distance measure implementations - the latter would be part of
our "vendor implementation".  The benefit of taking this approach is
that our metrics would then become (independently) useful to a broader
audience than our own clustering implementations (as would the
clustering impls themselves, if they implement the spec API).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message