commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bradford Cross" <>
Subject Re: [Math] rolling calculations with lag
Date Thu, 04 Oct 2007 04:22:17 GMT
OK, I have created a patch...I tried to follow the instructions to file a
bug on bugzilla but i can't seem to find the right place to file a new bug
to either commons or commons math.

I wonder if someone could help me out.


On 9/29/07, Phil Steitz <> wrote:
> On 9/22/07, Bradford Cross <> wrote:
> > Greetings!
> >
> > Recently I stumbled into the Commons math project; nice design, good
> > abstractions, "smart updates" and even unit tests! :-)
> >
> Thanks!
> > the Smart updates are a key feature for event stream processing / time
> > series simulation.  The only piece that is missing from a time series
> > analysis and simulation perspective is the ability to supply a lag that
> > defines a fixed sample size and perform rolling calculations.
> >
> That functionality actually already exists in the
> DescriptiveStatistics class.  You can set a "window size" for rolling
> computations of univariate statistics using the concrete
> implementation of this class,
> o.a.c.math.stat.descriptive.DescriptiveStatisticsImpl.  See
> > I was very happy to see this as an item on the wish list.
> The wishlist item is not as clear as it could be.  Sorry about that.
> In addition to the computations in DescriptiveStatistics that require
> that you maintain all of the values in the current window in memory,
> we also support "storeless" computation of statistics than can be
> computed in one pass through the data. This allows very large data
> streams to be handled with fixed storage overhead.  I think that what
> the wishlist item refers to is something in between - ways to support
> the window concept without storing all of the data.  Strictly
> speaking, this is impossible, but doing things like sampling from the
> streams, periodically resetting or maintaining arrays of storeless
> stats with different offsets would in theory be possible.
> >
> > A ThoughtWorks colleague (Yaxin Wang) and I are prototyping a java time
> > series simulation engine and we are considering the commons math as the
> base
> > of our numerical libraries.  In order to do this we need to complete the
> > rolling calculations, so here is our first spike (spike means prototype
> that
> > can be thrown away / not a real patch.)  We thought we would start with
> an
> > easy case; mean, which uses sum.
> >
> > We have already combined the rolling calculations with the smart update
> > algorithms before in the numerical libraries for our previous time
> series
> > simulation engine.  As you have mentioned in the wish list notes, our
> past
> > experience is that some of the algorithms can not avoid using queues for
> > rolling updates case.  Obviously it is something pretty fundamental to
> the
> > design and requires a bit of work across a lot of places to do this for
> all
> > the statistics (at least starting with summary statistics.)
> >
> > Please give feedback on the design, any issues with performance (better
> data
> > structure than the queue we used), etc!
> >
> > If the community is OK with this initial spike, then we can start
> submitting
> > patches. :-)
> >
> Thanks for the contribution! There are a few problems with
> incorporating the code as is, though.  First it uses generics and the
> concurrent package, which requires JDK 1.5 and our current minimum JDK
> level is 1.3.  That could probably be eliminated fairly easily,
> though.  The second is really whether or not the queue implementation
> is going to improve performance over the ResizeableDoubleArray store
> that DescriptiveStatisticsImpl uses now.  If you think so and can
> demonstrate with benchmarks, we can talk about swapping out that
> implementation.  Otherwise, its probably better to use
> ResizeableDoubleArray.
> I am +1 on adding a RollingStatistic abstract base class (would prefer
> that name to "Statistic" since it is specialized) like you have
> defined and rolling versions of the individual statistics.  This would
> be a convenience over the current setup and provide a more intuitive
> way to access rolling stats than to use DescriptiveStatisticsImpl as a
> container.  Currently this is only the only way to do it.  So if you
> can refactor to either use ResizableDoubleArray as the backing store
> (look at DescriptiveStatisticsImpl.apply - the convenience classes
> could just use that pattern) or otherwise eliminate the JDK 1.5
> dependency, I would support adding the rolling stats.  If I understand
> correctly the idea of what you mean by Sum, and Mean (using
> constructor arguments to determine whether or not statistic is
> rolling), I would prefer to leave the existing statistics in
> commons-math as is and introduce Rolling versions as separate classes.
> One more thing.  It is very important that any contributions that you
> make can be made in accordance with the Apache Contributor's License
> Agreement.  Have a look here:
> and make sure you can agree to those terms.  Then you can start
> submitting patches with attachements to Jira tickets.
> Thanks!
> Phil
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message