cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5099) Since 1.1, get_count sometimes returns value smaller than actual column count
Date Thu, 10 Jan 2013 10:54:13 GMT


Sylvain Lebresne commented on CASSANDRA-5099:

In fact, I think that this point out to a more serious problem with TTL.

Because even get_count aside, if I do a get_slice with a particular count n, I expect that
if I get less than n values in my result, it means there is no more value matching whatever
replica I've provided. I'd say that any other behavior is a bug. But with TTL, if a column
expires while the query is processed, we may fail that count argument (which is exactly what
hits us here).

In other words, I'm not sure reintroducing the inefficiency of always doing one last almost
always useless query to make sure the paging is indeed over is the right fix, because I doubt
people that do manual row paging using get_slice actually implement that "do one last query
just in case the previous query lied on the count returned".

What I would suggest instead would be to have CassandraServer save the current time before
doing a get_slice, and use that time in thriftifyColumns when deciding if a column is considered
expired or not. That way we won't skip an expiring column from the result set if it had been
counted as live during the actual internal query.


> Since 1.1, get_count sometimes returns value smaller than actual column count
> -----------------------------------------------------------------------------
>                 Key: CASSANDRA-5099
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.7
>            Reporter: Jason Harvey
>            Assignee: Yuki Morishita
>             Fix For: 1.1.9
>         Attachments: 5099-1.1.txt
> We have a CF where rows have thousands of TTLd columns. The columns are continually added
at a regular rate, and TTL out after 15 minutes. We continually run a 'get_count' on these
keys to get a count of the number of live columns.
> Since we upgrade from 1.0 to 1.1.7, "get_count" regularly returns much smaller values
than are possible. For example, with  roughly 15,000 columns that have well-distributed TTLs,
running a get_count 10 times will result in 1 or 2 results that are up to half the actual
column count. Using a normal 'get' to count those columns always results in proper values.

> For example:
> (all of these counts were ran within a second or less of eachother)
> {code}
> [default@reddit] count  AccountsActiveBySR['2qh0u'];
> 13665 columns
> [default@reddit] count  AccountsActiveBySR['2qh0u'];
> 13665 columns
> [default@reddit] count  AccountsActiveBySR['2qh0u'];
> 13666 columns
> [default@reddit] count  AccountsActiveBySR['2qh0u'];
> 3069 columns
> [default@reddit] count  AccountsActiveBySR['2qh0u'];
> 13660 columns
> [default@reddit] count  AccountsActiveBySR['2qh0u'];
> 13661 columns
> {code}
> I should note that this issue happens much more frequently with larger (>10k columns)
rows than smaller rows. It never seems to happen with rows having fewer than 1k columns.
> There are no supercolumns in use. The key names and column names are very short, and
there are no column values. The CF is LCS, and due to the TTL only hovers around a few MB
in size. GC grace is normally at zero, but the problem is consistent with non-zero gc grace
> It appears that there was an issue (CASSANDRA-4833) fixed in 1.1.7 regarding get_count.
Some logic was added to prevent an infinite loop case. Could that change have resulted in
this problem somehow? I can't find any other relevant 1.1 changes that might explain this

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message