cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lerh Chuan Low (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS
Date Fri, 19 Jan 2018 00:23:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331469#comment-16331469
] 

Lerh Chuan Low edited comment on CASSANDRA-8460 at 1/19/18 12:22 AM:
---------------------------------------------------------------------

Also just bumping this, wondering if you still had plans with it [~jjirsa] or [~bdeggleston]?
Looks like with the patch you had previously (https://github.com/jeffjirsa/cassandra/commit/cc0ab8f733eef63ed0eaea30cc6f471b467c3ec5#diff-f628011a74763c0d0abc369bc8f5762bR126)
most of the code changes are still applicable. I am willing to give it a go. 

It sounds like we may still be uncertain on how to go about implementing this. My original
thoughts are with Jeff's, where the archive directories also keep an instance of {{XCompactionStrategy}}
running for a repaired, unrepaired and pending repair set. It will still have to be read and
used eventually when doing repairs or streaming when adding a new node...so it increasingly
looks like it will not be ideal to put it into archiving directory and just never touch it
again, though I'm happy to implement it however people think is better because there may be
things that are not obvious to me. Flushing won't be aware that an archiving directory exists
in this case...and will keep flushing to the actual {{data_directories}}. Eventually compaction
will pick it up and toss it into {{archive_data_directories}}, if applicable. 

Just on that though, one thing I am unable to wrap my head around so far is whether the archive
directory will need to have the same guarantee as a multiple data directories setting...so
whether a single vnode/token range cannot span across it and another directory, and we have
to include it when distributing token ranges across the multiple directories. 

[~stone] does raise an interesting point though on making it uncoupled from CS and using a
background periodic task that archives SSTables. I'm guessing in this case you would archive
based on...SSTable metadata min/max timestamp? Or just the last modified of the SSTable files?
It will be a YAML property and if there is an SSTable with max timestamp behind X days, archive
the SSTable? 



was (Author: lerh low):
Also just bumping this, wondering if you still had plans with it [~jjirsa] or [~bdeggleston]?
Looks like with the patch you had previously (https://github.com/jeffjirsa/cassandra/commit/cc0ab8f733eef63ed0eaea30cc6f471b467c3ec5#diff-f628011a74763c0d0abc369bc8f5762bR126)
most of the code changes are still applicable. I am willing to give it a go. 

It sounds like we may still be uncertain on how to go about implementing this. My original
thoughts are with Jeff's, where the archive directories also keep an instance of {{XCompactionStrategy}}
running for a repaired, unrepaired and pending repair set. It will still have to be read and
used eventually when doing repairs or streaming when adding a new node...so it increasingly
looks like it will not be ideal to put it into archiving directory and just never touch it
again, though I'm happy to implement it however people think is better because there may be
things that are not obvious to me. Flushing won't be aware that an archiving directory exists
in this case...and will keep flushing to the actual {{data_directories}}. Eventually compaction
will pick it up and toss it into {{archive_data_directories}}, if applicable. 

Just on that though, one thing I am unable to wrap my head around so far is whether the archive
directory will need to have the same guarantee as a multiple data directories setting...so
whether a single vnode/token range cannot span across it and another directory. 

[~stone] does raise an interesting point though on making it uncoupled from CS and using a
background periodic task that archives SSTables. I'm guessing in this case you would archive
based on...SSTable metadata min/max timestamp? Or just the last modified of the SSTable files?
It will be a YAML property and if there is an SSTable with max timestamp behind X days, archive
the SSTable? 


> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8460
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Priority: Major
>              Labels: doc-impacting, dtcs
>             Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data directories where
we move the sstables once they are older than max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big spinning
disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message