hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: removing cells in minor compaction
Date Fri, 23 Jun 2017 19:06:54 GMT
(Disclaimer: my previous message did not involve verification in code or
turning up test cases to prove my assertions. For example, the
documentation claims that we retain versions beyond the max configured when
we do minor compactions but I do not see in code how that is done. Perhaps
this is how it used to be. Need to dig more).

On Mon, Jun 19, 2017 at 8:27 AM, Dave Latham <latham@davelink.net> wrote:

> And for any of the cases - if not, then why not?  Because that hasn't been
> implemented, or there's an actual reason that HBase would not want to do
> it?

Being able to delete in minor compaction would be an improvement; we are
reading the data anyways.

Traditionally, the spoke in the wheel is the fact that we allow edits to
come in in any order -- clients can write an edit into the past or into the
future -- so we can't be sure at compaction time that we see edits in their
insert order. If sequenceid were a first class attribute of Cells, always
present, we could rely on it figuring order.

Absent sequenceid, minor compactions are always adjacent (according to the
order in which they were flushed) subsets of all files in the store; with
this precept, we know we can safely remove versions if in our subset we've
encountered > configured max versions.

> With reads for a custom time range, it's possible to still read data that
> is waiting to be GCed from one of the above mechanisms and will disappear
> after that happens.  Doing the GC during minor compactions as well as major
> ones would change that visibility window, but doesn't seem to change that
> odd behavior that is there to begin with.
Should we support retaining deletes even on major compactions for some
user-configured period?

Thanks D,

P.S. This section needs a tuneup:

> On Wed, Jun 14, 2017 at 5:51 PM, Dave Latham <latham@davelink.net> wrote:
> > What cells, if any, are removed during minor compactions?
> >
> > Cells that
> > (a) are beyond the TTL?
> > (b) are shadowed by a delete marker? (from the files compacted)
> > (c) are shadowed by newer versions? (assuming numVersions configured <
> num
> > versions of the cell found)
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message