phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoffrey Jacoby (Jira)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-5645) BaseScannerRegionObserver should prevent compaction from purging very recently deleted cells
Date Tue, 14 Jan 2020 22:56:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Geoffrey Jacoby updated PHOENIX-5645:
-------------------------------------
    Description: 
Phoenix's SCN feature has some problems, because HBase major compaction can remove Cells that
have been deleted or whose TTL or max versions has caused them to be expired. 

For example, IndexTool rebuilds and index scrutiny can both give strange, incorrect results
if a major compaction occurs in the middle of their run. In the rebuild case, it's because
we're rewriting "history" on the index at the same time that compaction is rewriting "history"
by purging deleted and expired cells. 

Create a new configuration property called "max lookback age", which declares that no data
written more recently than the max lookback age will be compacted away. The max lookback age
must be smaller than the TTL, and it should not be legal for a user to look back further in
the past than the table's TTL. 

Max lookback age by default will not be set, and the current behavior will be preserved. But
if max lookback age is set, it will be enforced by the BaseScannerRegionObserver for all tables.


In the future, this should be contributed as a general feature to HBase for arbitrary tables.
See HBASE-23602.

  was:
IndexTool rebuilds and index scrutiny can both give strange, incorrect results if a major
compaction occurs in the middle of their run. In the rebuild case, it's because we're rewriting
"history" on the index at the same time that compaction is rewriting "history" by purging
deleted and expired cells. 

In the case of scrutiny, it's because it does an SCN-based lookback, and if versions are purged
on the index before their equivalent data table rows, you can get false errors. 

Since in the new indexing path we already have a coprocessor on each index, it should override
the compaction hook to shield rows newer than some configurable age from being purged during
a major compaction.

In the future, this should be contributed as a general feature to HBase for arbitrary tables.


        Summary: BaseScannerRegionObserver should prevent compaction from purging very recently
deleted cells  (was: GlobalIndexChecker should prevent compaction from purging very recently
deleted cells)

> BaseScannerRegionObserver should prevent compaction from purging very recently deleted
cells
> --------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5645
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5645
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Geoffrey Jacoby
>            Assignee: Geoffrey Jacoby
>            Priority: Major
>         Attachments: PHOENIX-5645-4.x-HBase-1.5-v2.patch, PHOENIX-5645-4.x-HBase-1.5.patch,
PHOENIX-5645-4.x-HBase-1.5.v3.patch
>
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Phoenix's SCN feature has some problems, because HBase major compaction can remove Cells
that have been deleted or whose TTL or max versions has caused them to be expired. 
> For example, IndexTool rebuilds and index scrutiny can both give strange, incorrect results
if a major compaction occurs in the middle of their run. In the rebuild case, it's because
we're rewriting "history" on the index at the same time that compaction is rewriting "history"
by purging deleted and expired cells. 
> Create a new configuration property called "max lookback age", which declares that no
data written more recently than the max lookback age will be compacted away. The max lookback
age must be smaller than the TTL, and it should not be legal for a user to look back further
in the past than the table's TTL. 
> Max lookback age by default will not be set, and the current behavior will be preserved.
But if max lookback age is set, it will be enforced by the BaseScannerRegionObserver for all
tables. 
> In the future, this should be contributed as a general feature to HBase for arbitrary
tables. See HBASE-23602.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message