cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Cassandra Wiki] Update of "API07" by ToddBlose
Date Thu, 22 Apr 2010 06:16:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "API07" page has been changed by ToddBlose.


New page:
## page was copied from API
== Overview ==
The Cassandra Thrift API changed between [[API03|0.3]], [[API04|0.4]], [[API|0.5]] and 0.6;
this document explains the 0.6 version.

Cassandra's client API is built entirely on top of Thrift. It should be noted that these documents
mention default values, but these are not generated in all of the languages that Thrift supports.
 Full examples of using Cassandra from Thrift, including setup boilerplate, are found on ThriftExamples.
 Higher-level clients are linked from ClientOptions.

'''WARNING:''' Some SQL/RDBMS terms are used in this documentation for analogy purposes. They
should be thought of as just that; analogies. There are few similarities between how data
is managed in a traditional RDBMS and Cassandra. Please see DataModel for more information.

== Terminology / Abbreviations ==
 Keyspace:: Contains multiple Column Families.
 CF:: !ColumnFamily.
 SCF:: !ColumnFamily of type "Super".
 Key:: A unique string that identifies a row in a CF.  For clarity, rows are always identified
by keys; columns are identified by names.  Note that Thrift's Java code [i.e., Cassandra server]
assumes that Strings are always encoded as UTF-8, but if you are using a non-Java client,
you may need to manually encode non-ascii strings as utf8 first.  (This is the major place
Thrift does not support interoperability between different platforms well.)
 Column:: A tuple of name, value, and timestamp; names are unique within rows.

== Exceptions ==
 NotFoundException:: A specific column was requested that does not exist.
 InvalidRequestException:: Invalid request could mean keyspace or column family does not exist,
required parameters are missing, or a parameter is malformed. `why` contains an associated
error message.
 UnavailableException:: Not all the replicas required could be created and/or read.
 TimedOutException:: The node responsible for the write or read did not respond during the
rpc interval specified in your configuration (default 10s).  This can happen if the request
is too large, the node is oversaturated with requests, or the node is down but the failure
detector has not yet realized it (usually this takes < 30s).
 TApplicationException:: Internal server error or invalid Thrift method (possible if you are
using an older version of a Thrift client with a newer build of the Cassandra server).
 AuthenticationException:: Invalid authentication request (user does not exist or credentials
 AuthorizationException:: Invalid authorization request (user does not have access to keyspace)

== Structures ==
=== ConsistencyLevel ===
The `ConsistencyLevel` is an `enum` that controls both read and write behavior based on `<ReplicationFactor>`
in your `storage-conf.xml`. The different consistency levels have different meanings, depending
on if you're doing a write or read operation.  Note that if `W` + `R` > `ReplicationFactor`,
where W is the number of nodes to block for on write, and R the number to block for on reads,
you will have strongly consistent behavior; that is, readers will always see the most recent
write.  Of these, the most interesting is to do `QUORUM` reads and writes, which gives you
consistency while still allowing availability in the face of node failures up to half of `ReplicationFactor`.
 Of course if latency is more important than consistency then you can use lower values for
either or both.

All discussion of "nodes" here refers to nodes responsible for holding data for the given
key; "surrogate" nodes involved in HintedHandoff do not count towards achieving the requested

==== Write ====
||'''Level''' ||'''Behavior''' ||
||`ZERO` ||Ensure nothing. A write happens asynchronously in background ||
||`ANY` ||Ensure that the write has been written to at least 1 node, including hinted recipients.
||`ONE` ||Ensure that the write has been written to at least 1 node's commit log and memory
table before responding to the client. ||
||`QUORUM` ||Ensure that the write has been written to `<ReplicationFactor> / 2 + 1`
nodes before responding to the client. ||
||`ALL` ||Ensure that the write is written to all `<ReplicationFactor>` nodes before
responding to the client.  Any unresponsive nodes will fail the operation. ||

==== Read ====
||'''Level''' ||'''Behavior''' ||
||`ZERO` ||Not supported, because it doesn't make sense. ||
||`ANY` ||Not supported. You probably want ONE instead. ||
||`ONE` ||Will return the record returned by the first node to respond. A consistency check
is always done in a background thread to fix any consistency issues when `ConsistencyLevel.ONE`
is used. This means subsequent calls will have correct data even if the initial read gets
an older value.  (This is called `read repair`.) ||
||`QUORUM` ||Will query all nodes and return the record with the most recent timestamp once
it has at least a majority of replicas reported.  Again, the remaining replicas will be checked
in the background. ||
||`ALL` ||Will query all nodes and return the record with the most recent timestamp once all
nodes have replied.  Any unresponsive nodes will fail the operation. ||

'''Note: '''Different language toolkits may have their own Consistency Level defaults as well.
To ensure the desired Consistency Level, you should always explicitly set the Consistency

=== ColumnOrSuperColumn ===
Due to the lack of inheritance in Thrift, `Column` and `SuperColumn` structures are aggregated
by the `ColumnOrSuperColumn` structure. This is used wherever either a `Column` or `SuperColumn`
would normally be expected.

If the underlying column is a `Column`, it will be contained within the `column` attribute.
If the underlying column is a `SuperColumn`, it will be contained within the `super_column`
attribute. The two are mutually exclusive - i.e. only one may be populated.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`column` ||`Column` ||n/a ||N ||The `Column` if this `ColumnOrSuperColumn` is aggregating
a `Column`. ||
||`super_column` ||`SuperColumn` ||n/a ||N ||The `SuperColumn` if this `ColumnOrSuperColumn`
is aggregating a `SuperColumn` ||

=== Column ===
The `Column` is a triplet of a name, value and timestamp. As described above, `Column` names
are unique within a row. Timestamps are arbitrary - they can be any integer you specify, however
they must be consistent across your application. It is recommended to use a timestamp value
with a fine granularity, such as milliseconds since the UNIX epoch. See DataModel for more
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`name` ||`binary` ||n/a ||Y ||The name of the `Column`. ||
||`value` ||`binary` ||n/a ||Y ||The value of the `Column`. ||
||`timestamp` ||`i64` ||n/a ||Y ||The timestamp of the `Column`. ||

=== SuperColumn ===
A `SuperColumn` contains no data itself, but instead stores another level of `Columns` below
the key. See DataModel for more details on what `SuperColumns` are and how they should be
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`name` ||`binary` ||n/a ||Y ||The name of the `SuperColumn`. ||
||`columns` ||`list<Column>` ||n/a ||Y ||The `Columns` within the `SuperColumn`. ||

=== ColumnPath ===
The `ColumnPath` is the path to a single column in Cassandra. It might make sense to think
of `ColumnPath` and `ColumnParent` in terms of a directory structure.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`column_family` ||`string` ||n/a ||Y ||The name of the CF of the column being looked up.
||`super_column` ||`binary` ||n/a ||N ||The super column name. ||
||`column` ||`binary` ||n/a ||N ||The column name. ||

=== ColumnParent ===
The `ColumnParent` is the path to the parent of a particular set of `Columns`. It is used
when selecting groups of columns from the same !ColumnFamily. In directory structure terms,
imagine `ColumnParent` as `ColumnPath + '/../'`.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`column_family` ||`string` ||n/a ||Y ||The name of the CF of the column being looked up.
||`super_column` ||`binary` ||n/a ||N ||The super column name. ||

=== SlicePredicate ===
A `SlicePredicate` is similar to a [[|mathematic
predicate]], which is described as "a property that the elements of a set have in common."

`SlicePredicate`'s in Cassandra are described with either a list of `column_names` or a `SliceRange`.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`column_names` ||`list<binary>` ||n/a ||N ||A list of column names to retrieve. This
can be used similar to Memcached's "multi-get" feature to fetch N known column names. For
instance, if you know you wish to fetch columns 'Joe', 'Jack', and 'Jim' you can pass those
column names as a list to fetch all three at once. ||
||`slice_range` ||`SliceRange` ||n/a ||N ||A `SliceRange` describing how to range, order,
and/or limit the slice. ||

If `column_names` is specified, `slice_range` is ignored.

=== SliceRange ===
A `SliceRange` is a structure that stores basic range, ordering and limit information for
a query that will return multiple columns. It could be thought of as Cassandra's version of
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`start` ||`binary` ||n/a ||Y ||The column name to start the slice with. This attribute is
not required, though there is no default value, and can be safely set to `''`, i.e., an empty
byte array, to start with the first column name.  Otherwise, it must be a valid value under
the rules of the Comparator defined for the given `ColumnFamily`. ||
||`finish` ||`binary` ||n/a ||Y ||The column name to stop the slice at. This attribute is
not required, though there is no default value, and can be safely set to an empty byte array
to not stop until `count` results are seen. Otherwise, it must also be a valid value to the
`ColumnFamily` Comparator. ||
||`reversed` ||`bool` ||`false` ||Y ||Whether the results should be ordered in reversed order.
Similar to `ORDER BY blah DESC` in SQL. ||
||`count` ||`integer` ||`100` ||Y ||How many columns to return. Similar to `LIMIT 100` in
SQL. May be arbitrarily large, but Thrift will materialize the whole result into memory before
returning it to the client, so be aware that you may be better served by iterating through
slices by passing the last value of one call in as the `start` of the next instead of increasing
`count` arbitrarily large. ||

=== KeyRange ===
A `KeyRange` is used by `get_range_slices` to define the range of keys to get the slices for.

The semantics of start keys and tokens are slightly different. Keys are start-inclusive; tokens
are start-exclusive. Token ranges may also wrap -- that is, the end token may be less than
the start one. Thus, a range from keyX to keyX is a one-element range, but a range from tokenY
to tokenY is the full ring.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`start_key` ||`string` ||n/a ||N ||The first key in the inclusive `KeyRange`. ||
||`end_key` ||`string` ||n/a ||N ||The last key in the inclusive `KeyRange`. ||
||`start_token` ||`string` ||n/a ||N ||The first token in the exclusive `KeyRange`. ||
||`end_token` ||`string` ||n/a ||N ||The last token in the exclusive `KeyRange`. ||
||`count` ||`i32` ||100 ||Y ||The total number of keys to permit in the `KeyRange`. ||

=== KeySlice ===

A `KeySlice` encapsulates a mapping of a key to the slice of columns for it as returned by
the get_range_slices operation. Normally, when slicing a single key, a `list<ColumnOrSuperColumn>`
of the slice would be returned. When slicing multiple or a range of keys, a `list<KeySlice>`
is instead returned so that each slice can be mapped to their key.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`key` ||`string` ||n/a ||Y ||The key for the slice. ||
||`columns` ||`list<ColumnOrSuperColumn>` ||n/a ||Y ||The columns in the slice. ||

=== TokenRange ===

A structure representing structural information about the cluster provided by the `describe`
utility methods detailed below.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`start_token` ||`string` ||n/a ||Y ||The first token in the `TokenRange`. ||
||`end_token` ||`string` ||n/a ||Y ||The last token in the `TokenRange`. ||
||`endpoints` ||`list<string>` ||n/a ||Y ||A list of the endpoints (nodes) that replicate
data in the `TokenRange`. ||

=== Mutation ===

A `Mutation` encapsulates either a column to insert, or a deletion to execute for a key. Like
`ColumnOrSuperColumn`, the two properties are mutually exclusive - you may only set one on
a Mutation.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`column_or_supercolumn` ||`ColumnOrSuperColumn` ||n/a ||N ||The column to insert in to the
key. ||
||`deletion` ||`Deletion` ||n/a ||N ||The deletion to execute on the key. ||

=== Deletion ===

A `Deletion` encapsulates an operation that will delete all columns matching the specified
`timestamp` and `predicate`. If `super_column` is specified, the `Deletion` will operate on
columns within the `SuperColumn` - otherwise it will operate on columns in the top-level of
the key.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`timestamp` ||`i64` ||n/a ||Y ||The timestamp of the column(s) to be deleted. ||
||`super_column` ||`binary` ||n/a ||N ||The super column to delete the column(s) from. ||
||`predicate` ||`SlicePredicate` ||n/a ||N ||A predicate to match the column(s) to be deleted
from the key/super column. ||

=== AuthenticationRequest ===

A structure that encapsulates a request for the connection to be authenticated. The authentication
credentials are arbitrary - this structure simply provides a mapping of credential name to
credential value.
||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Description''' ||
||`credentials` ||`map<string, string>` ||n/a ||Y ||A map of named credentials. ||

== Method calls ==
=== login ===
 . `void login(keyspace, auth_request)`

Authenticates with the cluster for operations on the specified keyspace using the specified
`AuthenticationRequest` credentials. Throws `AuthenticationException` if the credentials are
invalid or `AuthorizationException` if the credentials are valid, but not for the specified

=== get ===
 . `ColumnOrSuperColumn get(keyspace, key, column_path, consistency_level)`

Get the `Column` or `SuperColumn` at the given `column_path`.  If no value is present, `NotFoundException`
is thrown.  (This is the only method that can throw an exception under non-failure conditions.)

=== get_slice ===
 . `list<ColumnOrSuperColumn> get_slice(keyspace, key, column_parent, predicate, consistency_level)`

Get the group of columns contained by `column_parent` (either a `ColumnFamily` name or a `ColumnFamily/SuperColumn`
name pair) specified by the given `SlicePredicate` struct.

=== multiget_slice ===
 . `map<string,list<ColumnOrSuperColumn>> multiget_slice(keyspace, keys, column_parent,
predicate, consistency_level)`

Retrieves slices for `column_parent` and `predicate` on each of the given keys in parallel.
Keys are a `list<string> of the keys to get slices for.

This is similar to `get_range_slice` (Cassandra 0.5) except operating on a set of non-contiguous
keys instead of a range of keys.

=== get_count ===
 . `i32 get_count(keyspace, key, column_parent, consistency_level)`

Counts the columns present in `column_parent`.

The method is not O(1). It takes all the columns from disk to calculate the answer. The only
benefit of the method is that you do not need to pull all the columns over Thrift interface
to count them.

=== get_range_slices ===
 . `list<KeySlice> get_range_slices(keyspace, column_parent, predicate, range, consistency_level)`

Replaces `get_range_slice`. Returns a list of slices for the keys within the specified `KeyRange`.
Unlike get_key_range, this applies the given predicate to all keys in the range, not just
those with undeleted matching data.  This method is only allowed when using an order-preserving

=== insert ===
 . `insert(keyspace, key, column_path, value, timestamp, consistency_level)`

Insert a `Column` consisting of (`column_path.column`, `value`, `timestamp`) at the given
`column_path.column_family` and optional `column_path.super_column`.  Note that `column_path.column`
is here required, since a !SuperColumn cannot directly contain binary values -- it can only
contain sub-Columns.

=== batch_mutate ===
 . `batch_mutate(keyspace, mutation_map, consistency_level)`

Executes the specified mutations on the keyspace. `mutation_map` is a `map<string, map<string,
list<Mutation>>>`; the outer map maps the key to the inner map, which maps the
column family to the `Mutation`; can be read as: `map<key : string, map<column_family
: string, list<Mutation>>>`.  To be more specific, the outer map key is a row
key, the inner map key is the column family name.

A `Mutation` specifies either columns to insert or columns to delete. See `Mutation` and `Deletion`
above for more details.

=== remove ===
 . `remove(keyspace, key, column_path, timestamp, consistency_level)`

Remove data from the row specified by `key` at the granularity specified by `column_path`,
and the given `timestamp`.  Note that all the values in `column_path` besides `column_path.column_family`
are truly optional: you can remove the entire row by just specifying the !ColumnFamily, or
you can remove a !SuperColumn or a single Column by specifying those levels too. Note that
the `timestamp` is needed, so that if the commands are replayed in a different order on different
nodes, the same result is produced.

=== describe_keyspaces ===
 . `set<string> describe_keyspaces()`

Gets a list of all the keyspaces configured for the cluster.

=== describe_cluster_name ===
 . `string describe_cluster_name()`

Gets the name of the cluster.

=== describe_version ===
 . `string describe_version()`

Gets the Thrift API version.

=== describe_ring ===
 . `list<TokenRange> describe_ring(keyspace)`

Gets the token ring; a map of ranges to host addresses. Represented as a set of `TokenRange`
instead of a map from range to list of endpoints, because you can't use Thrift structs as
map keys: for the same reason, we can't return
a set here, even though order is neither important nor predictable.

=== describe_keyspace ===
 . `map<string, map<string, string>> describe_keyspace(keyspace)`

Gets information about the specified keyspace.

== Examples ==
[[|There are a few examples on this page over

View raw message