hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Batyrshin <0x62...@gmail.com>
Subject Re: Replication current progress at position -1
Date Fri, 08 Nov 2019 08:23:00 GMT
I have exactly this issue.
Fixed by moving replicated table region to “stalled” region server.
Kim, thank you for descriptive answer and good luck in fixing.

> On 8 Nov 2019, at 06:01, Jungdae Kim <kjd9306@gmail.com> wrote:
> 
> Hello, Alexander
> 
> HBase 1.4.x have some issues related to updating the position of WALs being
> replicated.
> one of the issues is about stacking old WALs, when a region server has no
> regions of the table being replicated, or no mutations come in for a while.
> 
> I'm not sure you have the same issue, with the your logs.
> If you are suffering the same issue, you can find many old WALs in HDFS
> oldWals directory({hbase.rootdir}/oldWALs), and in zookeeper replication
> queues ({znodeParent/replication/rs/{rs}/{peer}/}, and also detour the
> issue by assigning a region of tables being replicated to the region server.
> 
> The  issue has already reported (
> https://issues.apache.org/jira/browse/HBASE-22784), and resolved.
> But, unfortunately, the patch spawned the other issues such as region
> server aborting (https://issues.apache.org/jira/browse/HBASE-23169)
> 
> I'm working on these issues in
> https://issues.apache.org/jira/browse/HBASE-23205 (not merged yet)
> 
> I hope this will be helpful to you.
> 
> On Thu, Nov 7, 2019 at 12:30 AM Alexander Batyrshin <0x62ash@gmail.com>
> wrote:
> 
>> Hello all,
>> Sometimes we observer that replication is not working at HBase-1.4.10
>> 
>>    hbase07.prod.hbcluster:
>>       SOURCE: PeerID=lp_analytics, AgeOfLastShippedOp=0,
>> SizeOfLogQueue=1, TimeStampsOfLastShippedOp=Thu Jan 01 03:00:00 MSK 1970,
>> Replication Lag=1573052815347
>>       SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Oct 07
>> 19:15:54 MSK 2019
>> 
>> At logs:
>> 
>> 2019-11-06 18:10:54,252 INFO  [hbase07:60020Replication Statistics #0]
>> regionserver.Replication: Normal source for cluster lp_analytics: Total
>> replicated edits: 0, current progress:
>> walGroup [hbase07.prod.hbcluster%2C60020%2C1570464952456]: currently
>> replicating from:
>> hdfs://prodfashion01/hbase/WALs/hbase07.prod.hbcluster,60020,1570464952456/hbase07.prod.hbcluster%2C60020%2C1570464952456.1573051524020
>> at position: -1
>> 2019-11-06 18:15:54,252 INFO  [hbase07:60020Replication Statistics #0]
>> regionserver.Replication: Normal source for cluster lp_analytics: Total
>> replicated edits: 0, current progress:
>> walGroup [hbase07.prod.hbcluster%2C60020%2C1570464952456]: currently
>> replicating from:
>> hdfs://prodfashion01/hbase/WALs/hbase07.prod.hbcluster,60020,1570464952456/hbase07.prod.hbcluster%2C60020%2C1570464952456.1573051524020
>> at position: -1
>> 
>> I can’t find any errors or something that could help me to diagnose why
>> replication not working at this node. At other nodes replication works like
>> a charm.
>> Any ideas what’s is wrong?


Mime
View raw message