cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] Updated: (CASSANDRA-1601) Refactor index definitions
Date Mon, 11 Oct 2010 17:21:33 GMT


Jonathan Ellis updated CASSANDRA-1601:

         Priority: Major  (was: Critical)
    Fix Version/s:     (was: 0.7.0)

This is a huge amount of feature creep to jam end at the end of 0.7.  (Nor do I think indexing
supercolumn data is even desirable.)  Pushing to 0.8.

> Refactor index definitions
> --------------------------
>                 Key: CASSANDRA-1601
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>            Reporter: Stu Hood
>             Fix For: 0.8
> h3. Overview
> There are a few considerations for defining secondary indexes and row validation that
I don't think have been brought up yet. While the interface is still malleable pre 0.7.0,
we should attempt to make changes that allow for forwards compatibility of index/validator
schemas. This is an umbrella ticket for suggesting/debating the changes: other tickets should
be opened for quick improvements that can be made before 0.7.0.
> ----
> h3. Index output types
> The output (queryable) data from an indexing operation is what actually goes in the index.
For a particular row, the output can be either _single-valued_, _multi-valued_ or _compound_:
> * Single-valued
> ** Implemented in trunk (special case of multi-valued)
> * Multi-valued
> ** Multiple index values _of the same type_ can match a single row
> ** Row probably contains a list/set (perhaps in a supercolumn)
> * Compound
> ** Multiple base properties concatenated as one index entry 
> ** Different validators/comparators for each component
> ** (Given the simplicity of performing boolean operations on 1472 indexes, compound local
indexes are unlikely to ever be worthwhile, but compound distributed indexes will be: see
comments on CASSANDRA-1599)
> h3. Index input types
> The other end of indexing is selection of values from a row to be indexed. Selection
can correspond directly to our current {{db.filter.*}} implementations, and may be best implemented
by specifying the validator/index using the same Thrift objects you would use for a similar
> * Name selection
> ** Implemented in trunk, but should probably just be a special case of list selection
> ** Corresponds to db.filter.NamesQueryFilter of size 1
> * List selection
> ** Should specify a list of columns of which all values must be of the same type, as
defined by the Validator
> ** Corresponds to db.filter.NamesQueryFilter
> * Range (prefix?) selection
> ** Subsets of a row may be interesting for indexing
> ** Range corresponds to db.filter.SliceQueryFilter
> *** (A Prefix might actually be more useful for indexing, but is better implemented by
indexing an arbitrarily nested row)
> ** Open question: might the ability to index only the 'top N values' from a row be useful?
If so, then this selector should allow N to be specified like it would be for a slice
> h3. Supercolumns/arbitrary-nesting
> Another consideration is that we should be able to support indexing and validation of
supercolumns (and hence, arbitrarily nested rows). Since the selection of columns to index
is essentially the same as the selection of columns to return for a query, this can probably
mirror (and suggest improvements to) our query API.
> h3. UDFs
> This is obviously still an open area, but user defined indexing functions are essentially
a transform between the _input_ and _output_ (as defined above), which would normally have
equal structures. Leaving room for UDFs in our index definitions makes sense, and will likely
lead to a much more general and elegant design.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message