cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-767) Row keys should be byte[]s, not Strings
Date Sun, 01 Aug 2010 11:42:18 GMT


Uwe Schindler commented on CASSANDRA-767:

About Lucandra: Currently all keys in Lucene are valid UTF-8 encoded bytes, so making them
Strings in Cassandra is fine - also for numeric terms as Todd Nine said (they use only 7 bits
of the byte[], so are valid UTF-8 - but there was still a bug in Cassandra by trimming keys,
now solved).

Lucene trunk now has migrated to pure byte[] terms, so Lucandra will do the same. It is therefore
no longer guaranteed that terms in an Lucene index are really representable as String, also
the ordering of keys must be native unsigned byte[] and not UTF-16 (String.compareTo()) for
several Queries in Lucene to work correct.

Additionally, the encoding of terms in Lucene trunk (aka 4.0) will also change to BOCU-1 for
better space efficiency of eastern languages, also numeric terms will saved as raw byte[]
with full 8bits, too.

> Row keys should be byte[]s, not Strings
> ---------------------------------------
>                 Key: CASSANDRA-767
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stu Hood
>            Assignee: Stu Hood
>            Priority: Critical
>             Fix For: 0.7 beta 1
>         Attachments: 0001-Implement-compaction-benchmark.patch, 0002-Implement-a-legacy-sstable-test.patch,
0003-Store-bytes-in-DecoratedKey-and-cleanup-dead-code.patch, 0004-Extract-read-writeName.patch,
0005-Convert-IPartitioner-disk-key-format-to-bytes.patch, 0006-Bump-SSTable-version-to-c-remove-utf16-encoding-from.patch
> This issue has come up numerous times, and we've dealt with a lot of pain because of
it: let's get it knocked out.
> Keys being Java Strings can make it painful to use Cassandra from other languages, encoding
binary data like integers as Strings is very inefficient, and there is a disconnect between
our column data types and the plain String treatment we give row keys.
> The key design decision that needs discussion is: Should we apply the column AbstractTypes
to row keys? If so, how do Partitioners change?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message