cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.
Date Wed, 18 Jan 2017 13:56:26 GMT


Branimir Lambov commented on CASSANDRA-10520:

Updated and rebased branch for 4.0:
|[trunk patch|]|[utest|]|[dtest|]|[dtest

Performance according to the attached JMH microbench, with disabled chunk cache ({{file_cache_size:
32}} in {{cassandra.yaml}}) to force data accesses to perform a read:
ReadWriteTestCompression.readFixed             mmap             {compression off}  avgt  
15   4.281 ± 0.048  us/op
ReadWriteTestCompression.readFixed             mmap    {LZ4, crc_check_chance: 0}  avgt  
15   7.286 ± 0.107  us/op
ReadWriteTestCompression.readFixed             mmap                         {LZ4}  avgt  
15   9.744 ± 0.085  us/op
ReadWriteTestCompression.readFixed             mmap  {LZ4, min_compress_ratio: 0}  avgt  
15  14.353 ± 0.189  us/op
ReadWriteTestCompression.readFixed  mmap_index_only             {compression off}  avgt  
15   5.264 ± 0.037  us/op
ReadWriteTestCompression.readFixed  mmap_index_only    {LZ4, crc_check_chance: 0}  avgt  
15   8.284 ± 0.082  us/op
ReadWriteTestCompression.readFixed  mmap_index_only                         {LZ4}  avgt  
15  11.662 ± 0.147  us/op
ReadWriteTestCompression.readFixed  mmap_index_only  {LZ4, min_compress_ratio: 0}  avgt  
15  17.910 ± 0.110  us/op
The last option is the baseline, always store compressed, the third is with default threshold
(1.1) which in this case stores uncompressed chunks, the second with CRC checks off, and the
first is with compression turned off (which doesn't do CRC checks). The new code saves 4.5-5
microseconds of the query time.

It is possible to construct a special compressed rebufferer that is zero-copy and cache-bypassing
for non-compressed memmapped or cached chunks, but that adds some complexity and IMHO should
be done in a separate patch.

> Compressed writer and reader should support non-compressed data.
> ----------------------------------------------------------------
>                 Key: CASSANDRA-10520
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>              Labels: messaging-service-bump-required
>             Fix For: 4.x
> Compressing uncompressible data, as done, for instance, to write SSTables during stress-tests,
results in chunks larger than 64k which are a problem for the buffer pooling mechanisms employed
by the {{CompressedRandomAccessReader}}. This results in non-negligible performance issues
due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it does not provide
benefits, I think we should allow compressed files to store uncompressed chunks as alternative
to compressed data. Such a chunk could be written after compression returns a buffer larger
than, for example, 90% of the input, and would not result in additional delays in writing.
On reads it could be recognized by size (using a single global threshold constant in the compression
metadata) and data could be directly transferred into the decompressed buffer, skipping the
decompression step and ensuring a 64k buffer for compressed data always suffices.

This message was sent by Atlassian JIRA

View raw message