cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "ArchitectureOverview" by tuxracer69
Date Sat, 14 Nov 2009 14:57:10 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "ArchitectureOverview" page has been changed by tuxracer69.
http://wiki.apache.org/cassandra/ArchitectureOverview?action=diff&rev1=1&rev2=2

--------------------------------------------------

  Architecture details
  
  
- O(1) node lookup Explicit replication Eventually consistent
+  * O(1) node lookup 
+  * Explicit replication 
+  * Eventually consistent
  
  
  
  
  
  Architecture layers
- Messaging service Gossip Failure detection Cluster state Partitioner Replication Commit
log Memtable SSTable Indexes Compaction Tombstones Hinted handoff Read repair Bootstrap Monitoring
Admin tools
  
- Writes
+ 
+  * Messaging service 
+  * Gossip 
+  * Failure detection 
+  * Cluster state 
+  * Partitioner 
+  * Replication 
+ 
+  * Commit log 
+  * Memtable 
+  * SSTable 
+  * Indexes 
+  * Compaction 
+ 
+  * Tombstones 
+  * Hinted handoff 
+  * Read repair 
+  * Bootstrap 
+  * Monitoring 
+  * Admin tools
+ 
+ == Writes ==
  
  
  Any node Partitioner Commitlog, memtable SSTable Compaction Wait for W responses
  
  
+ Write model:
  
+ There are two write modes:
+  * ''Quorum write'': blocks until quorum is reached
+  * ''Async write'': sends request to any node. That node will push the data to appropriate
nodes but return to client immediately
  
  
+ If node down, then write to another node with a hint saying where it should be written two.
Harvester every 15 min goes through and find hints and moves the data to the appropriate node
  
+ === Write path ===
+ At write time, 
+  * you first write to a '''disk commit log''' (sequential)
+  * After write to log it is sent to the appropriate nodes
+  * Each node receiving write first records it in a local log, then makes update to appropriate
'''memtables''' (one for each column family). A Memtable is Cassandra's in-memory representation
of key/value pairs
+ before the data gets flushed to disk as an SSTable.  
+  * '''Memtables''' are flushed to disk when:
+    * Out of space
+    * Too many keys (128 is default)
+    * Time duration (client provided – no cluster clock)
+  * When memtables written out two files go out:
+    * Data File ('''SSTable'''). A SSTable (terminology borrowed from Google) stands for
Sorted Strings Table and is a file of key/value string pairs, sorted by keys.
+    * Index File ('''SSTable Index'''). (Similar to Hadoop !MapFile / Tfile)
+      * (Key, offset) pairs (points into data file)
+      * Bloom filter (all keys in data file)
+  * When a commit log has had all its column families pushed to disk, it is deleted
+  * '''Compaction''': Data files accumulate over time.  Periodically data files are merged
sorted into a new file (and creates new index)
+    * Merge keys 
+    * Combine columns 
+    * Discard tombstones
  
  
  
  
  
+ == Remove ==
- Memtable / SSTable
- 
- Disk
- Commit log
- 
- SSTable format
- 
- 
- Key / data
- 
- SSTable Indexes
- 
- 
- Bloom filter Key Column
- 
- 
- 
- 
- 
- (Similar to Hadoop MapFile / Tfile)
- 
- Compaction
- 
- 
- Merge keys Combine columns Discard tombstones
- 
- 
- 
- 
- 
- Remove
  
  
  Deletion marker (tombstone) necessary to suppress data in older SSTables, until compaction
Read repair complicates things a little Eventually consistent complicates things more Solution:
configurable delay before tombstone GC, after which tombstones are not repaired
@@ -154, +171 @@

  
  
  
- Read path
+ == Read path ==
  
  
  Any node Partitioner Wait for R responses Wait for N ­ R responses in the background and
perform read repair

Mime
View raw message