hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Trivial Update of "Architecture" by udanax
Date Thu, 12 Mar 2009 09:53:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The following page has been changed by udanax:

- == Store Dense/Sparse Matrices ==
+ = Store Dense/Sparse Matrices =
  To store the matrices, Hama use a [http://hadoop.apache.org/hbase/ Hbase] -- Matrices are
basically tables. They are ways of storing numbers and other things. Typical matrix has rows
and columns. Actually called a 2-way matrix because it has two dimensions. For example, you
might have respondents-by-attitudes. Of course, you might collect the same data on the same
people at 5 points in time. In that case, you either have 5 different 2-way matrices, or you
could think of it as a 3-way matrix, that is respondent-by-attitude-by-time.
  -- ''Just a thought, considering the depleted activity in HBase should we not explore ways
to avoid HBase ? --Prasen ''
- == Perform matrix operations ==
+ = Perform matrix operations =
  The Hadoop/Hbase is designed to efficiently process large data set by connecting many commodity
computers together to work in parallel but, If there's a inter-node communication, the elapsed
run time will be slower with more nodes. Consequently, an "effective" algorithm should avoid
large amounts of communication.
+ == Dense Matrix-Matrix multiplication ==
+ Blocking jobs:
+  * Collect the blocks to 'collectionTable' from A and B.
+   * A map task receives a row n as a key, and vector as its value
+    * emit (blockID, sub-vector)
+   * Reduce task combines block
+ Multiplication job:
+  * A map task receives a blockID n as a key, and two submatrices as its value
+  * Reduce task computes sum of blocks
+ == Computes maximum absolute row sum ==
+  * A map task receives a row n as a key, and vector as its value
+   * emit (row, the sum of the absolute value of each entries)
+  * Reduce task selects the maximum one

View raw message