Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The following page has been changed by udanax:
http://wiki.apache.org/hama/Architecture


= Overview =
 A parallel matrix computation package.
+ Hama is a parallel matrix computational package.
 == Package Structure ==
+ Matrices are basically tables. They are ways of storing numbers and other things. Typical
matrix has rows and columns. Actually called a 2way matrix because it has two dimensions.
For example, you might have respondentsbyattitudes. Of course, you might collect the same
data on the same people at 5 points in time. In that case, you either have 5 different 2way
matrices, or you could think of it as a 3way matrix, that is respondentbyattitudebytime.
+ We choosed Hbase which <row, column, timestamp> columnoriented sparse table storage
to store the matrices.
 * org.apache.hama : Dense and structured sparse matrices
 * org.apache.hama.algebra : Algebraic operations on map/reduce
 * org.apache.hama.io : I/O operations with matrices and vectors
 * org.apache.hama.mapred : Map/Reduce Input/Output Formats
 * org.apache.hama.sparse : Unstructured sparse matrices
 
 == Sparse Matrix ==

 '''NOTE:'''

 * Sparse matrix operations cannot be optimized
 * Sparse structures which are growable can exceed the initial bandwidth allocation, while
those which are not growable are fixed, and overallocation will cause an error
 * Matrices which are column major typically perform better with columnoriented operations,
and likewise for row major matrices. Matrix/vector multiplication is rowmajor, while transpose
multiplication is columnmajor


 === Why sparse matrices? ===

 * Many classes of problems result in matrices with a large number of zeros
 * A sparse matrix is a special class of matrix that allows only the nonzero terms to be
stored
 * Reduction in the storage requirements for sparse matrices
 * Significant speed improvement as many calculations involving zero elements are neglected

 === Storage of sparse matrices ===

 We choosed HBase which columnoriented sparse table storage to reduce storage and complexity.
* Hama use columnoriented storage of matrices (HBase) , and so compressed column format
is a natural choice of sparse storage
* Hama forces the elements of each column to be stored in increasing order of their row
index
 {{{
 1 0 0 (1,1) = 1
 0 3 1 (2,2) = 3
 0 0 0 (2,3) = 1
 }}}

See also: [http://labs.google.com/papers/bigtableosdi06.pdf Bigtable], A Distributed Storage
System for Structured Data
 === Pseudo code for sparse matrix addition ===
+ 
+ == Parallel Strategies for Dense Matrix ==
 '''NOTE:'''

 * There are no duplicates in the input.
 
 == Parallel Strategies ==
In Map/Reduce programming, user can easily take advantage of the below parallel data layouts,
communication paradigms.
