cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "FAQ" by JonathanEllis
Date Thu, 19 Nov 2009 16:30:46 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by JonathanEllis.
The comment on this change is: move hardware to separate page.
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=28&rev2=29

--------------------------------------------------

  <<Anchor(what_kind_of_hardware_should_i_use)>>
  == What kind of hardware should I run Cassandra on? ==
  
+ See [CassandraHardware].
- === Memory ===
- The most recently written data resides in memory tables (aka [[MemtableThresholds|memtables]]),
but older data that has been flushed to disk can be kept in the OS's file-system cache. In
other words, ''the more memory, the better'', with 1GB being the minimum recommended.
- 
- === CPU ===
- Many workloads will actually be CPU-bound in Cassandra before being memory-bound.  Cassandra
is highly concurrent and will make good use of however many cores you can give it.
- 
- 
- === Disk ===
- The short answer here is, ''at least 2 disks'', one to keep your `CommitLogDirectory` on,
the other to use in `DataFileDirectories`. The exact answer though depends a lot on your usage
so it's important to understand what is going on here.
- 
- Cassandra persists data to disk for two very different purposes. The first, when a new write
is made so that it can be replayed after a crash or system shutdown. The second when thresholds
are exceeded and memtables are flushed to disk as SSTables.
- 
- Commit logs receive every write made to a Cassandra node and have the potential to block
client operations, but they are only ever read on node start-up. SSTables writes on the other
hand occur asynchronously, but are read to satisfy client look-ups. SSTables are also periodically
merged and rewritten in a process called ''compaction''. Another important distinction is
that commit logs are purged after the corresponding data has been flushed to disk as an SSTable,
so `CommitLogDirectory` only holds uncommitted data while the directories in `DataFileDirectories`
store all of the data written to a node.
- 
- So to summarize, use a different device for your `CommitLogDirectory`; it needn't be large,
but it should be fast enough to receive all of your writes. Then, use one or more devices
for `DataFileDirectories` and make sure they are both large enough to house all of your data,
and fast enough to satisfy your reads and to keep up with flushing and compaction.
  
  <<Anchor(architecture)>>
  == What are SSTables and Memtables? ==

Mime
View raw message