Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The following page has been changed by udanax:
http://wiki.apache.org/hama/Architecture


= Overview =
 Hama is a parallel matrix computational package.
 Matrices are basically tables. They are ways of storing numbers and other things. Typical
matrix has rows and columns. Actually called a 2way matrix because it has two dimensions.
For example, you might have respondentsbyattitudes. Of course, you might collect the same
data on the same people at 5 points in time. In that case, you either have 5 different 2way
matrices, or you could think of it as a 3way matrix, that is respondentbyattitudebytime.
+ Hama use a [http://hadoop.apache.org/hbase/ Hbase] to store the matrices  Matrices are
basically tables. They are ways of storing numbers and other things. Typical matrix has rows
and columns. Actually called a 2way matrix because it has two dimensions. For example, you
might have respondentsbyattitudes. Of course, you might collect the same data on the same
people at 5 points in time. In that case, you either have 5 different 2way matrices, or you
could think of it as a 3way matrix, that is respondentbyattitudebytime.

 We choosed Hbase which <row, column, timestamp> columnoriented sparse table storage
to store the matrices.

 * Hama use columnoriented storage of matrices (HBase) , and so compressed column format
is a natural choice of sparse storage
 * Hama forces the elements of each column to be stored in increasing order of their row
index

 See also: [http://labs.google.com/papers/bigtableosdi06.pdf Bigtable], A Distributed Storage
System for Structured Data

 == Parallel Strategies for Dense Matrix ==
+ == Dense Matrix ==
+ For dense matrix computations, The blockpartitioned algorithms used to minimize data movement
and network cost. Dense Matrix and Blocked Dense Matrix are both stored in one table with
other metadata.
 In Map/Reduce programming, user can easily take advantage of the below parallel data layouts,
communication paradigms.
+ (BTW, How to synchronize them? Should we blocking when operate it?  Edward)
 * 1D Row Blocked Layout
 * 1D Row Block Cyclic Layout
 * 2D Row and Row Blocked Layout
 * 2D Row and Row Block Cyclic Layout

 === Square blocking ===

 The matrix multiplication of the original arrays can be transformed into matrix multiplication
of blocks. For example,
+ For example, The matrix multiplication of the original arrays can be transformed into matrix
multiplication of blocks as describe below.
C_block(1,1)=A_block(1,1)*B_block(1,1) + A_block(1,2)*B_block(2,1)
@@ 46, +32 @@
+++ +++ +++
}}}
 So, we can reduce the time of full scan.

