hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Trivial Update of "RoadMap" by Edward J. Yoon
Date Sun, 27 Jun 2010 13:02:25 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "RoadMap" page has been changed by Edward J. Yoon.


+ = Plans for 0.2.0 release =
- = Short-term Issues (for 0.2.0 release) =
- [[http://markmail.org/search/?q=hama-dev+discuss#query:hama-dev%20discuss+page:1+mid:amlvccbptom3yro3+state:results]]
- == Re-factoring issues ==
   * Move current code related to matrix operations to the 'examples' package [[https://issues.apache.org/jira/browse/HAMA-243|HAMA-243]]
   * A design of structure of the matrix/graph
- == BSP issues ==
   * Consider more simplified BSP programming interface [[https://issues.apache.org/jira/browse/HAMA-244|HAMA-244]]
   * BSP examples [[https://issues.apache.org/jira/browse/HAMA-221|HAMA-221]]
   * hadoop RPC performance analysis [[https://issues.apache.org/jira/browse/HAMA-245|HAMA-245]]
   * [documentation] Parallel, and Distributed Programming With BSP in Hama [[https://issues.apache.org/jira/browse/HAMA-248|HAMA-248]]
+ ----
+ = Plans for 0.3.0 release =
- ----
- = Long-term Issues =
- We have a plan to redesign Hama to be based on BSP model and be specified to shared nothing
systems consisting of several thousands commodity servers, which is generally called cloud
computing environments.
- == Why BSP? ==
+  * Add in/output system
+  * More reliable fault tolerant system
+  * Web-UI monitoring tool of BSP job progress
+ And, ...
- In respect of graph package, BSP is also necessary for Hama to process graph data efficiently
in shared-nothing architectures. The essence of graph data is connectivities between vertices.
During processing, Hama will need not only some vertex's data but also its adjacent vertices'
data. Assume that we have a graph data set that partitioned to some cohesive subgraphs. That
is, the adjacent vertices can be saved in the same physical storage or near storage as possible.
Although we have well-partitioned graphs, MapReduce doesn't exploit its characteristic since
it reads input data sequentially and it can’t control its input data. In addition, its partitioner
hashes the input data. However, BSP mode can enable graph processing to be performed efficiently
while preserving the locality of graph data.
- === Design Considerations ===
-  * Fault Tolerance - Hama aims at running on a several thousands of commodity servers, so
it is subject to some fault. In addition, Hama is for large-scale processing that generally
takes long time ranging from few minutes to several hours. Therefore, it is important for
Hama to finish some given jobs although faults occur during processing. If not, Hama has to
restart all jobs.
-  * Heterogeneity - 
-  * Efficiency
-  * Easy to Use
- (Working)
- === TODO ===
-  * A survey on matrix and graph processing algorithms based on BSP programming model.
-  * Developing a fault-tolerant mechanism for BSP model.
-  * Developing a struggling mechanism for BSP model.
-  * Implement BSP frameworks based on the source code that we have done.
-  * A selection of the primitive operations for matrix processing and linear algebra.
-  * Implement the primitive operations for matrix and linear algebra.
-  * Develop operation models based on the primitive operations developed above.
-  * Implement processing framework for matrix and linear algebra.
-  * Design domain-specific language that well reflects to algebraic characteristics.
-  * A selection of the primitive operations for graph processing.
-  * Develop operation models based on the above primitive operation for large-scale graph
- (Working)
  = Idea Generating and Research Tasks =

View raw message