cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "CassandraHardware" by JonathanEllis
Date Thu, 19 Nov 2009 16:31:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "CassandraHardware" page has been changed by JonathanEllis.


New page:
=== Memory ===
The most recently written data resides in memory tables (aka [[MemtableThresholds|memtables]]),
but older data that has been flushed to disk can be kept in the OS's file-system cache. In
other words, ''the more memory, the better'', with 1GB being the minimum recommended.

=== CPU ===
Many workloads will actually be CPU-bound in Cassandra before being memory-bound.  Cassandra
is highly concurrent and will make good use of however many cores you can give it.

=== Disk ===
The short answer here is, ''at least 2 disks'', one to keep your `CommitLogDirectory` on,
the other to use in `DataFileDirectories`. The exact answer though depends a lot on your usage
so it's important to understand what is going on here.

Cassandra persists data to disk for two very different purposes. The first, when a new write
is made so that it can be replayed after a crash or system shutdown. The second when thresholds
are exceeded and memtables are flushed to disk as SSTables.

Commit logs receive every write made to a Cassandra node and have the potential to block client
operations, but they are only ever read on node start-up. SSTables writes on the other hand
occur asynchronously, but are read to satisfy client look-ups. SSTables are also periodically
merged and rewritten in a process called ''compaction''. Another important distinction is
that commit logs are purged after the corresponding data has been flushed to disk as an SSTable,
so `CommitLogDirectory` only holds uncommitted data while the directories in `DataFileDirectories`
store all of the data written to a node.

So to summarize, use a different device for your `CommitLogDirectory`; it needn't be large,
but it should be fast enough to receive all of your writes. Then, use one or more devices
for `DataFileDirectories` and make sure they are both large enough to house all of your data,
and fast enough to satisfy your reads and to keep up with flushing and compaction.

View raw message