cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew F. Dennis (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-1046) optimize Memtable.getSliceIterator
Date Fri, 04 Jun 2010 19:56:57 GMT


Matthew F. Dennis commented on CASSANDRA-1046:

my profiler (not sure I trust it at the moment) showed different things (and at no point was
I able to get timeouts in the client, even using numbers an order of magnitude higher than
originally reported).

So, I created some scripts to help test this (still didn't get client timeouts - perhaps because
of the UUID changes previously made). Inserting prints time UUIDs for the start, ~middle and
end of what was inserted. These can be fed into the reader to start from the middle and read
the specified number of columns out. I was running these scripts by piping the insertator
output to tee uuids and calling the readarator with `cat uuids`.

On my laptop these changes reduced the run time of the scripts from about 2.5 minutes to less
than 15 seconds (with reversed slices taking a couple seconds more in total).

In addition, I reviewed the callers of ColumnFamily.getSortedColumns (I did not review any
test classes). Everything was already iterating. In particular:

SSTableExport.SerializeRow already iterates
  .thriftifyColumns already iterates
  .thriftify[Super]Columns already iterates
Migration.getLocalMigrations already iterates
SSTableNameIterator.<init> only creates an iterator for later use
QueryFilter.getRuduced only create an iterator and then calls next()
Table.load already iterates
  .pagingFinished just calls size
  .deliverHintsToEndpoint already iterates
  .deliverAllHints already iterates
DefsTable.loadFromStorage already iterates
CompactionManager.submitGraveyardCleanup already iterates
ColumnIndexer.seralize already iterates
ColumnFamilySerializer.serializeForSSTable already iterates
  .toString already iterates
  .addAll already iterates 

> optimize Memtable.getSliceIterator
> ----------------------------------
>                 Key: CASSANDRA-1046
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
> As reported by James Golick, about 30% of the time in a read is spent in SliceQueryFilter.getMemColumnIterator,
virtually all of which is in ConcurrentSkipListMap$Values.toArrray().
> I wrote on the ML:
> Besides the UUID optimization you posted, we should do an audit of ColumnFamily.getSortedColumns
and replace with iteration where possible (in this case, we'd be left with one copy of most
of the columns, but that's better than two).
> We can get rid of the other copy by fixing the logic in Memtable.getSliceIterator, which
says "copy all the columns, so we can do a binary search on them to find where to start,"
but since columns are natively in sorted order we could just use an iterator and a while loo

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message