hama-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hama Wiki] Update of "GroomServerFaultTolerance" by ChiaHungLin
Date Wed, 11 May 2011 02:28:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "GroomServerFaultTolerance" page has been changed by ChiaHungLin.
http://wiki.apache.org/hama/GroomServerFaultTolerance?action=diff&rev1=6&rev2=7

--------------------------------------------------

  
  === Literature Review ===
  
- In general, a system designed to deal with failures largely bases on the concepts including
unit of mitigation, redundancy, fault observer[4]. 
+ In general, a system designed to deal with failures usually need to apply  techniques including
unit of mitigation, redundancy, fault detection, fault recovery[4], and so on. 
  
  
- The architecture defines the basic unit which performs functions of a system according to
requirements.   
+ Unit of mitigation: GroomServer(s)/ BSPMaster
  
- Providing redundant units. 
+ Redundant units: GroomServer(s)
  
- Fault observers are designed to detect fault or error in an earlier stage so that other
strategies, such as error recovery can be employed to correct the problem. 
+ Fault detection: System monitor, heartbeat.
  
- 
+ Fault recovery: Fail over
  
  === Architecture ===
  '''Task Failure'''

Mime
View raw message