hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "Architecture" by udanax
Date Wed, 05 Nov 2008 13:48:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/hama/Architecture

------------------------------------------------------------------------------
  
  ----
  = Overview =
- A parallel matrix computation package. 
+ Hama is a parallel matrix computational package. 
  
- == Package Structure ==
+ Matrices are basically tables. They are ways of storing numbers and other things. Typical
matrix has rows and columns. Actually called a 2-way matrix because it has two dimensions.
For example, you might have respondents-by-attitudes. Of course, you might collect the same
data on the same people at 5 points in time. In that case, you either have 5 different 2-way
matrices, or you could think of it as a 3-way matrix, that is respondent-by-attitude-by-time.
  
+ We choosed Hbase which <row, column, timestamp> column-oriented sparse table storage
to store the matrices.
-  * org.apache.hama : Dense and structured sparse matrices
-  * org.apache.hama.algebra : Algebraic operations on map/reduce
-  * org.apache.hama.io : I/O operations with matrices and vectors
-  * org.apache.hama.mapred : Map/Reduce Input/Output Formats
-  * org.apache.hama.sparse : Unstructured sparse matrices
- ----
- == Sparse Matrix ==
- 
- '''NOTE:''' 
- 
-  * Sparse matrix operations cannot be optimized
-  * Sparse structures which are growable can exceed the initial bandwidth allocation, while
those which are not growable are fixed, and over-allocation will cause an error
-  * Matrices which are column major typically perform better with column-oriented operations,
and likewise for row major matrices. Matrix/vector multiplication is row-major, while transpose
multiplication is column-major
- 
- 
- === Why sparse matrices? ===
- 
-  * Many classes of problems result in matrices with a large number of zeros
-  * A sparse matrix is a special class of matrix that allows only the non-zero terms to be
stored
-  * Reduction in the storage requirements for sparse matrices
-  * Significant speed improvement as many calculations involving zero elements are neglected
- 
- === Storage of sparse matrices ===
- 
- We choosed HBase which column-oriented sparse table storage to reduce storage and complexity.
  
   * Hama use column-oriented storage of matrices (HBase) , and so compressed column format
is a natural choice of sparse storage
   * Hama forces the elements of each column to be stored in increasing order of their row
index
  
- {{{
-   1  0  0       (1,1) = 1           
-   0  3  1       (2,2) = 3
-   0  0  0       (2,3) = 1
- }}}
- 
  See also: [http://labs.google.com/papers/bigtable-osdi06.pdf Bigtable], A Distributed Storage
System for Structured Data
  
- === Pseudo code for sparse matrix addition ===
+ ----
  
+ == Parallel Strategies for Dense Matrix ==
- '''NOTE:''' 
- 
-  * There are no duplicates in the input.
- ----
- == Parallel Strategies ==
  
  In Map/Reduce programming, user can easily take advantage of the below parallel data layouts,
communication paradigms.
  

Mime
View raw message