cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Liu (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4421) Support cql3 table definitions in Hadoop InputFormat
Date Thu, 02 May 2013 18:40:17 GMT


Alex Liu commented on CASSANDRA-4421:

User needs to pass the following settings to the job.
1. Keyspace and Columnfamily name
2. intial host, port and Partitioner
2. Column names that need to be retrieved (optional), default are all the columns
3. the number of CQL rows per page (optional), default is 1000
4. User defined the where clauses on indexed columns (optional)

The input format is of List<IColumn>, Map<ByteBuffer, IColumn>
where List<IColumn> is the keys columns including partition keys and clustering keys
Map<ByteBuffer, IColumn> is the map of CQL query output column name and column

Internally, we use the following CQL query
  SELECT <columns> 
  FROM   <Column_family_name> 
  WHERE  <where_clause>
    AND  <user_defined_WhereClauses_on_indexed_column> 
  LIMIT  <page_row_size>

<where_clause> could be any of the following format
 WHERE token(<partition_key>) >= <start_token> 
   AND token(<partition_key>) <= <end_token>
 WHERE token(<partition_key>) > token(<partition_key_value>) 
   AND token(<partition_key>) <= <end_token>
 WHERE token(<partition_key>) = token(<partition_key_value>) 
   AND <clustering_key1> = <key_value1>
   AND <clustering_key2> > <key_value2>
   AND token(<partition_key>) <= <end_token>
 WHERE token(<partition_key>) = token(<partition_key_value>) 
   AND <clustering_key1> = <key_value1>
   AND <clustering_key2> = <key_value2>
   AND <clustering_key3> > <key_value3>
   AND token(<partition_key>) <= <end_token> 

> Support cql3 table definitions in Hadoop InputFormat
> ----------------------------------------------------
>                 Key: CASSANDRA-4421
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: API
>    Affects Versions: 1.1.0
>         Environment: Debian Squeeze
>            Reporter: bert Passek
>              Labels: cql3
>             Fix For: 1.2.5
> Hello,
> i faced a bug while writing composite column values and following validation on server
> This is the setup for reproduction:
> 1. create a keyspace
> create keyspace test with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor
= 1;
> 2. create a cf via cql (3.0)
> create table test1 (
>     a int,
>     b int,
>     c int,
>     primary key (a, b)
> );
> If i have a look at the schema in cli i noticed that there is no column metadata for
columns not part of primary key.
> create column family test1
>   with column_type = 'Standard'
>   and comparator = 'CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type)'
>   and default_validation_class = 'UTF8Type'
>   and key_validation_class = 'Int32Type'
>   and read_repair_chance = 0.1
>   and dclocal_read_repair_chance = 0.0
>   and gc_grace = 864000
>   and min_compaction_threshold = 4
>   and max_compaction_threshold = 32
>   and replicate_on_write = true
>   and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
>   and caching = 'KEYS_ONLY'
>   and compression_options = {'sstable_compression' : ''};
> Please notice the default validation class: UTF8Type
> Now i would like to insert value > 127 via cassandra client (no cql, part of mr-jobs).
Have a look at the attachement.
> Batch mutate fails:
> InvalidRequestException(why:(String didn't validate.) [test][test1][1:c] failed validation)
> A validator for column value is fetched in ThriftValidation::validateColumnData which
returns always the default validator which is UTF8Type as described above (The ColumnDefinition
for given column name "c" is always null)
> In UTF8Type there is a check for
> if (b > 127)
>    return false;
> Anyway, maybe i'm doing something wrong, but i used cql 3.0 for table creation. I assigned
data types to all columns, but i can not set values for a composite column because the default
validation class is used.
> I think the schema should know the correct validator even for composite columns. The
usage of the default validation class does not make sense.
> Best Regards 
> Bert Passek

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message