hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akmal Abbasov <akmal.abba...@icloud.com>
Subject Re: HBase strange behaviour
Date Tue, 07 Jul 2015 07:05:38 GMT
> Have you run the following command in hbase shell ?
> balance_switch true
I’ve tried, and this did the trick. Thank you.

One more thing is not clear for me is what I can do with ~4000 znodes in 
/hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot
What will happen with them if I’ll do nothing, will the system try to complete all of these
applications?

Thank you.


> On 07 Jul 2015, at 00:16, Ted Yu <yuzhihong@gmail.com> wrote:
> 
> Have you run the following command in hbase shell ?
> balance_switch true
> 
> Cheers
> 
> On Mon, Jul 6, 2015 at 12:16 PM, Akmal Abbasov <akmal.abbasov@icloud.com>
> wrote:
> 
>>> Do you see in the master log something similar to the following ?
>>> 
>>> master.HMaster: Not running balancer because 1 region(s) in transition
>> yes, I have several of them, but all of them were 3 days ago.
>> 
>> I check the ‘ritCount’ metric, and it is 0, also I checked the
>> /hbase/region-in-transition znode, which is also empty.
>> But I can’t start balancer manually.
>> 
>> I took snapshot of tables each our.
>> I’ve checked the path
>> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot under in zookeeper,
>> and there
>> are ~4000 applications. It looks that all of them are create snapshot
>> operations. Also I’ve observed that the CPU
>> usage of the master is much higher that it was in the past.
>> Is it possible that all of this applications are causing the problem?
>> 
>> Can I delete all of this applications?
>> 
>> 
>>> On 06 Jul 2015, at 18:45, Ted Yu <yuzhihong@gmail.com> wrote:
>>> 
>>> Do you see in the master log something similar to the following ?
>>> 
>>> master.HMaster: Not running balancer because 1 region(s) in transition
>>> 
>>> You can search backwards for balancer / assignment related logs.
>>> 
>>> Cheers
>>> 
>>> On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <akmal.abbasov@icloud.com>
>>> wrote:
>>> 
>>>>> What error(s) did you get when trying to restart the region server ?
>> Have
>>>>> you checked its log files ?
>>>> it was a VM, and I was not able to access it any more, I can’t login to
>>>> it. Restarting several times didn’t helped.
>>>> 
>>>> 
>>>>> Can you check master log around this time ? If there was region in
>>>>> transition, balancer wouldn't balance.
>>>> I have a lot of this
>>>> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
>>>> util.FSVisitor: No logs under
>>>> 
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs
>>>> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
>>>> util.FSVisitor: No logs under
>>>> 
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs
>>>> 2015-07-06 15:15:39,921 INFO  [snapshot-log-cleaner-cache-refresher]
>>>> util.FSVisitor: No logs under
>>>> 
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs
>>>> 2015-07-06 15:15:39,925 INFO  [snapshot-log-cleaner-cache-refresher]
>>>> util.FSVisitor: No logs under
>>>> 
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs
>>>> 2015-07-06 15:15:39,926 INFO  [snapshot-log-cleaner-cache-refresher]
>>>> util.FSVisitor: No logs under
>>>> 
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs
>>>> 2015-07-06 15:15:39,927 INFO  [snapshot-log-cleaner-cache-refresher]
>>>> util.FSVisitor: No logs under
>>>> 
>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs
>>>> 2015-07-06 15:15:39,928 INFO  [snapshot-log-cleaner-cache-refresher]
>>>> util.FSVisitor: No logs under
>>>> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs
>>>> 2015-07-06 15:15:47,324 INFO  [FifoRpcScheduler.handler1-thread-18]
>>>> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false
>>>> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner]
>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
>>>> hbase-rs1%2C60020%2C1436189457794.1436190023718
>>>> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner]
>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
>>>> hbase-rs1%2C60020%2C1436189457794.1436193624562
>>>> 2015-07-06 15:32:49,382 INFO  [FifoRpcScheduler.handler1-thread-14]
>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
>>>> 2015-07-06 15:32:56,936 INFO  [FifoRpcScheduler.handler1-thread-1]
>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
>>>> 
>>>> Thank you.
>>>> 
>>>>> On 06 Jul 2015, at 17:37, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>> 
>>>>> bq. I had to delete and recreate it
>>>>> 
>>>>> What error(s) did you get when trying to restart the region server ?
>> Have
>>>>> you checked its log files ?
>>>>> 
>>>>> bq. start balancer manually, but it returned false
>>>>> 
>>>>> Can you check master log around this time ? If there was region in
>>>>> transition, balancer wouldn't balance.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov <
>> akmal.abbasov@icloud.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> I have a strange behaviour in my HBase cluster. I have 5 rs and 2
>>>> masters.
>>>>>> One of the rs stopped working, restart didn’t worked, and I had
to
>>>> delete
>>>>>> and recreate it.
>>>>>> But when this rs have stopped, the cluster also stopped functioning.
>>>>>> There were a lot of inconsistencies. When I recreated the rs with
>> disks
>>>> of
>>>>>> the previous one, cluster started working.
>>>>>> But now, only 3 rs host the regions, other 2 have 0 regions.
>>>>>> I’ve tried to start balancer manually, but it returned false?
>>>>>> Any idea?
>>>>>> 
>>>>>> I am using hbase hbase-0.98.7-hadoop2.
>>>>>> Thank you.
>>>>>> 
>>>>>> Kind regards,
>>>>>> Akmal Abbasov
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message