commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject [math] Matrix storage format WAS Re: svn commit: r721203 [1/2] - in /commons/proper/math/branches/MATH_2_0: ./ src/java/org/apache/commons/math/linear/ src/site/xdoc/ src/site/xdoc/userguide/ src/test/org/apache/commons/math/linear/
Date Sat, 29 Nov 2008 15:10:36 GMT
luc.maisonobe@free.fr wrote:
> ----- "Ted Dunning" <ted.dunning@gmail.com> a écrit :
>
>   
>> Luc,
>>
>> Last I looked, I think I saw that commons math used a double indirect
>> storage format very similar to Jama.
>>
>> Is there any thought to going to a higher performance layout such as
>> used by
>> Colt?
>>     
>
> Yes.
> The linear algebra package is defined using interfaces RealMatrix, RealVector and separate
implementations RealMatrixImpl, RealVectorImpl ... The purpose is to abstract the storage
method. For now, only one representation has been used: a simple double[][]. It is possible
to provide other implementations too, for example a single array, handling the index computation
by some intermediate layer. I have recently seen a discussion on the web where it was argued
that single array was more efficient only for low dimensions (see http://cio.nist.gov/esd/emaildir/lists/jama/msg01425.html).
Also the optimizations of array access in the JVM has been vastly improved last few years,
so I think it would be worth checking what can be observed now. Providing several different
implementations using different strategies would be a very interesting feature, allowing users
to choose the implementation that best suit their needs.
>   
Interesting question.  I am +1 on alternative implementations that 
improve performance and/or add functionality (e.g. previous thread on 
labels).   Patches welcome!

Regarding the storage question, I have a couple of comments.

1. In some cases, the current impls work directly on rows (e.g.  
operate, multiply, lu decomposition) so there may be some save 
associated with the double[][] storage for these implementations that 
could cancel whatever benefit derives from the flat layout.  Of course, 
these impls could all be replaced by more efficient implementations that 
do not benefit from direct access to rows.  The point here is it would 
probably not help to just change storage layout for the current impls 
and could in fact hurt. 

2. An admittedly smelly, but useful feature of RealMatrixImpl is that it 
exposes the underlying double[][] array.  Another reason that an 
altogether new impl would be best if storage format were to change.

As Luc said, better / faster impls and at this point even improvements 
of the interfaces are welcome.

Phil

> Another improvement I am thinking about is to provide specialized implementations for
some matrices shapes. The first one would simply be band matrices. A single class could handle
diagonal matrices (when configured with 0 subdiagonals and 0 superdiagonals), bidiagonals
(lower and upper), tridiagonals and even triangular matrices with a rather extreme configuration
with n sub or superdiagonal. I thought this would better be implemented with a single double
array row by row for data locality and hence cache efficiency. Sparse matrices would be the
next step.
>
> We need time to do this.
>
> Luc
>
>   
>> On Thu, Nov 27, 2008 at 9:10 AM, <luc.maisonobe@free.fr> wrote:
>>
>>     
>>> Hello,
>>>
>>> This commit is the result of weeks of work. I hope it completes an
>>> important feature
>>> to [math], computation of eigenvalues and eigenvectors for symmetric
>>>       
>> real
>>     
>>> matrices.
>>>
>>> The implementation is based on algorithms developed in the last 10
>>>       
>> years or
>>     
>>> so. It is based partly on two reference papers and partly on LAPACK.
>>>       
>> Lapack
>>     
>>> is distributed under a modified-BSD license, so this is acceptable
>>>       
>> for
>>     
>>> [math]. I have updated the NOTICE file and taken care of the proper
>>> attributions in Javadoc.
>>>
>>> The current status is that we can solve eigenproblems much faster
>>>       
>> than Jama
>>     
>>> (see the speed gains in the commit message below). Furthermore, the
>>> eigenvectors are not always computed, they are computed only if
>>>       
>> needed. So
>>     
>>> applications that only need eigenvalues will benefit from a larger
>>>       
>> speed
>>     
>>> gain. This could even be improved again by allowing to compute only
>>>       
>> some
>>     
>>> eigenvalues, not all of them. This feature is available in the
>>>       
>> higher level
>>     
>>> LAPACK routine, but I didn't include it yet. I'll do it only when
>>>       
>> required,
>>     
>>> as this as already been a very large amount of work.
>>>
>>> If someone could test this new decomposition algorithm further, I
>>>       
>> would be
>>     
>>> more than happy.
>>>
>>> My next goal is now to implement Singular Value Decomposition. I
>>>       
>> will most
>>     
>>> probably use a method based on eigen decomposition as this seems to
>>>       
>> be now
>>     
>>> the prefered way since dqd/dqds and MRRR algorithms are available.
>>>
>>> Luc
>>>
>>> ----- luc@apache.org a écrit :
>>>
>>>       
>>>> Author: luc
>>>> Date: Thu Nov 27 07:50:42 2008
>>>> New Revision: 721203
>>>>
>>>> URL: http://svn.apache.org/viewvc?rev=721203&view=rev
>>>> Log:
>>>> completed implementation of EigenDecompositionImpl.
>>>> The implementation is now based on the very fast and accurate
>>>>         
>> dqd/dqds
>>     
>>>> algorithm.
>>>> It is faster than Jama for all dimensions and speed gain increases
>>>> with dimensions.
>>>> The gain is about 30% below dimension 100, about 50% around
>>>>         
>> dimension
>>     
>>>> 250 and about
>>>> 65% for dimensions around 700.
>>>> It is also possible to compute only eigenvalues (and hence saving
>>>> computation of
>>>> eigenvectors, thus even increasing the speed gain).
>>>> JIRA: MATH-220
>>>>         
>>>       
>> ---------------------------------------------------------------------
>>     
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>>       
>> --
>> Ted Dunning, CTO
>> DeepDyve
>> 4600 Bohannon Drive, Suite 220
>> Menlo Park, CA 94025
>> www.deepdyve.com
>> 650-324-0110, ext. 738
>> 858-414-0013 (m)
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>   



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message