phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karan Mehta (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-4953) DefaultStatisticsCollector fails to capture the last guidepost of every region
Date Fri, 05 Oct 2018 00:02:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karan Mehta updated PHOENIX-4953:
---------------------------------
    Description: 
The issue was found during a sanity test run when the count of all rows from all the guideposts
didn't match the actual number of rows in the table. 

{{DefaultStatisticsCollector#collectStatistics()}} method iterates over a list of cells and
keeps track of size of KV's. If the size exceeds guideposts width, it adds an entry to {{GuidePostsInfo}} using
{{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method. 

However for the last batch of rows that don't cross the threshold of GUIDE_POSTS_WIDTH, the
code doesn't create any entry for it using the Builder class. In an ideal case, we would want
to cover that scenario by introducing a small guide post with the corresponding row key and
the size of the that guidepost (since we can persist both the things to SYSTEM.STATS table).
This is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of data. 

  was:
The issue was found during a sanity test run when the count of all rows from all the guideposts
didn't match the actual number of rows in the table. 

`DefaultStatisticsCollector#collectStatistics()` method iterates over a list of cells and
keeps track of size of KV's. If the size exceeds guideposts width, it adds an entry to `GuidePostsInfo
using `GuidePostsInfoBuilder`addGuidePostOnCollection()` method. 

However for the last batch of rows that don't cross the threshold of GUIDE_POSTS_WIDTH, the
code doesn't create any entry for it using the Builder class. In an ideal case, we would want
to cover that scenario by introducing a small guide post with the corresponding row key and
the size of the that guidepost (since we can persist both the things to SYSTEM.STATS table).
This is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of data. 


> DefaultStatisticsCollector fails to capture the last guidepost of every region
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-4953
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4953
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Priority: Major
>
> The issue was found during a sanity test run when the count of all rows from all the
guideposts didn't match the actual number of rows in the table. 
> {{DefaultStatisticsCollector#collectStatistics()}} method iterates over a list of cells
and keeps track of size of KV's. If the size exceeds guideposts width, it adds an entry to
{{GuidePostsInfo}} using {{GuidePostsInfoBuilder#addGuidePostOnCollection()}} method. 
> However for the last batch of rows that don't cross the threshold of GUIDE_POSTS_WIDTH,
the code doesn't create any entry for it using the Builder class. In an ideal case, we would
want to cover that scenario by introducing a small guide post with the corresponding row key
and the size of the that guidepost (since we can persist both the things to SYSTEM.STATS table).
This is also because GUIDE_POSTS_WIDTH is an estimate/best effort for distribution of data. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message