hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry He <jerry...@gmail.com>
Subject Re: Deleting and cleaning old snapshots exported to S3
Date Mon, 27 Nov 2017 22:56:41 GMT
Hi, Tim

You seem to be have a nice solution/tool to the problem.  If you would
like to contribute to the HBase open source, that will certainly be
welcomed.
Once inside HBase, we can open up the access to the needed methods.

Thanks.

On Wed, Nov 22, 2017 at 2:03 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> Logged HBASE-19333.
>
>
> On Wed, Nov 22, 2017 at 1:11 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> For getSnapshotFiles, it returns protobuf class. That was why it is
>> private.
>>
>> If we create POJO class for SnapshotFileInfo which is returned, I think
>> the method can become public.
>>
>> Cheers
>>
>> -------- Original message --------
>> From: Timothy Brown <tim@siftscience.com>
>> Date: 11/22/17 12:52 PM (GMT-08:00)
>> To: user@hbase.apache.org
>> Subject: Re: Deleting and cleaning old snapshots exported to S3
>>
>> Hi Lex,
>>
>> We had a similar issue with our S3 bucket growing in size and we wrote our
>> own cleaner. The cleaner first looks at the HFiles required by the current
>> snapshots. We then figure out which snapshots we no longer want (for
>> example snapshots older than a week or whatever rules you want). Then we
>> find the HFiles that are only referenced by these unwanted snapshots and
>> delete these HFiles from S3.
>>
>> The tricky part is finding the HFiles for a given snapshot. There are two
>> ways to this.
>>
>> 1) Use:
>>
>> SnapshotDescription snapshotDesc =
>> SnapshotDescriptionUtils.readSnapshotInfo(fs, snapshotDir);
>> SnapshotReferenceUtil.visitReferencedFiles(conf, fs, snapshotDir,
>> snapshotDesc, snapshotVisitor)
>>
>> where snapshotVisitor is an implementation of the SnapshotVisitor interface
>> found here:
>> https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/
>> hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/
>> SnapshotReferenceUtil.java#L63
>>
>> 2) The ExportSnapshot class provides a private method that does this for
>> you. We ended up using reflection to make ExportSnapshot#getSnapshotFiles
>> public (see
>> https://github.com/cloudera/hbase/blob/cdh5-1.2.0_5.11.1/
>> hbase-server/src/main/java/org/apache/hadoop/hbase/
>> snapshot/ExportSnapshot.java#L539).
>> For example:
>>
>> Path snapshotPath = getCompletedSnapshotDir(snapshotName, rootDir);
>> Method method = ExportSnapshot.class.getDeclaredMethod("getSnapshotFiles",
>>     Configuration.class, FileSystem.class, Path.class);
>> method.setAccessible(true);
>> List<Pair<SnapshotFileInfo, Long>> snapshotFiles = method.invoke(null,
>> conf, fs, snapshotPath);
>>
>> I would love to know how other people are tackling this issue as well.
>>
>> -Tim
>>
>> On Mon, Nov 20, 2017 at 7:45 PM, Lex Toumbourou <lex@scrunch.com> wrote:
>>
>> > Hi all,
>> >
>> > Wondering if I could get some help figuring out how to clean out old
>> > snapshots that have been exported to S3?
>> >
>> > We've been exporting snapshots to S3 using the export snapshot command:
>> >
>> > bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
>> > some-snapshot -copy-to s3a://some-bucket/hbase
>> >
>> >
>> > Now the size of the S3 bucket is getting a little out of control and I'd
>> > like to remove the old snapshots and let HBase garbage collect blocks no
>> > longer referenced.
>> >
>> > One idea I had was to spin up an entirely new cluster that uses the S3
>> > bucket as the hbase.rootdir then just delete the snapshots as normal and
>> > maybe use cleaner_run to clean up the old files but it feels like
>> overkill
>> > having to spin up an entire cluster.
>> >
>> > So my question is: what's the best approach for deleting snapshots
>> exported
>> > to an s3 bucket and cleaning old store files no longer referenced? We are
>> > using HBase 1.3.1 on EMR.
>> >
>> > Thanks!
>> >
>> > Lex ToumbourouCTO at scrunch.com <http://scrunch.com/>
>> >
>>

Mime
View raw message