hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "Architecture" by udanax
Date Sat, 15 Nov 2008 13:29:08 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/hama/Architecture

------------------------------------------------------------------------------
  
  ----
  = Overview =
- Hama is a parallel matrix computational package. 
  
- Matrices are basically tables. They are ways of storing numbers and other things. Typical
matrix has rows and columns. Actually called a 2-way matrix because it has two dimensions.
For example, you might have respondents-by-attitudes. Of course, you might collect the same
data on the same people at 5 points in time. In that case, you either have 5 different 2-way
matrices, or you could think of it as a 3-way matrix, that is respondent-by-attitude-by-time.
+ Hama use a [http://hadoop.apache.org/hbase/ Hbase] to store the matrices -- Matrices are
basically tables. They are ways of storing numbers and other things. Typical matrix has rows
and columns. Actually called a 2-way matrix because it has two dimensions. For example, you
might have respondents-by-attitudes. Of course, you might collect the same data on the same
people at 5 points in time. In that case, you either have 5 different 2-way matrices, or you
could think of it as a 3-way matrix, that is respondent-by-attitude-by-time.
- 
- We choosed Hbase which <row, column, timestamp> column-oriented sparse table storage
to store the matrices.
- 
-  * Hama use column-oriented storage of matrices (HBase) , and so compressed column format
is a natural choice of sparse storage
-  * Hama forces the elements of each column to be stored in increasing order of their row
index
- 
- See also: [http://labs.google.com/papers/bigtable-osdi06.pdf Bigtable], A Distributed Storage
System for Structured Data
  
  ----
  
- == Parallel Strategies for Dense Matrix ==
+ == Dense Matrix ==
+ For dense matrix computations, The block-partitioned algorithms used to minimize data movement
and network cost. Dense Matrix and Blocked Dense Matrix are both stored in one table with
other metadata. 
  
- In Map/Reduce programming, user can easily take advantage of the below parallel data layouts,
communication paradigms.
+ (BTW, How to synchronize them? Should we blocking when operate it? -- Edward)
  
-  * 1D Row Blocked Layout
-  * 1D Row Block Cyclic Layout
-  * 2D Row and Row Blocked Layout
-  * 2D Row and Row Block Cyclic Layout
- 
- === Square blocking ===
- 
- The matrix multiplication of the original arrays can be transformed into matrix multiplication
of blocks. For example,
+ For example, The matrix multiplication of the original arrays can be transformed into matrix
multiplication of blocks as describe below.
  
  C_block(1,1)=A_block(1,1)*B_block(1,1) + A_block(1,2)*B_block(2,1)
  
@@ -46, +32 @@

  +-----+-----+     +-----+-----+   +-----+-----+
  }}}
  
- So, we can reduce the time of full scan.
- 

Mime
View raw message