hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Hoppins <marc.hopp...@eset.sk>
Subject RE: Region in Transition
Date Mon, 22 Feb 2021 12:52:04 GMT
Sorry, it seems my cut/paste omitted the PID for most of these.  The PID is the initial disabletableprocedure
(73587).

Is there a sane method to kill child procedure IDs, then parent ID?

-----Original Message-----
From: Marc Hoppins <marc.hoppins@eset.sk> 
Sent: Monday, February 22, 2021 12:16 PM
To: user@hbase.apache.org
Subject: RE: Region in Transition

EXTERNAL

Hi all,

Further:

Table 'hds2_md5' is disabled. There exists  (in the lock/procedures):

73587           WAITING         hbase   DisableTableProcedure table=hds2_md5
73827   73587   WAITING_TIMEOUT         hbase   UnassignProcedure table=hds2_md5
73937           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
73938           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
73949           RUNNABLE        jumbo   EnableTableProcedure table=hds2_md5
78370           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
78371           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
78372           RUNNABLE        jumbo   AssignProcedure table=hds2_md5
87386           RUNNABLE        jumbo   EnableTableProcedure table=hds2_md5
123914          RUNNABLE        hbase   EnableTableProcedure table=hds2_md5

And then a whole bunch of these:
73588   73587   SUCCESS         hbase   UnassignProcedure table=hds2_md5

I am informed that the table can remain disabled after.
What is the method to fix these issues?  I have built a 'operator-tools1.0.0' jar and dropped
it in place on the active master.

Thanks again

Marc

-----Original Message-----
From: Marc Hoppins <marc.hoppins@eset.sk>
Sent: Friday, February 19, 2021 12:22 PM
To: user@hbase.apache.org
Subject: Region in Transition

EXTERNAL

Hi all,

The RIT message shows the following:

Owner procedure: { ID => '73827', PARENT_ID => '73587', STATE => 'WAITING_TIMEOUT',
OWNER => 'hbase', TYPE => 'UnassignProcedure table=hds2_md5, region=f25fe93e24b34cb2f7fffddee1d89eec,
server=ba-hbase25.jumbo.hq. com,16020,1604475904456', START_TIME => 'Thu Feb 18 06:31:06
CET 2021', LAST_UPDATE => 'Fri Feb 19 10:49:20 CET 2021', PARAMETERS => [ { transitionState
=> 'REGION_TRANSITION_DISPATCH', regionInfo => { regionId => '1535957697205', tableName
=> { namespace => 'ZGVmYXVsdA==', qualifier => 'aGRzMl9tZDU=' }, startKey => 'QkRGRkVFRg==',
endKey => 'QkVBQTgyMUQy', offline => 'false', split => 'false', replicaId => '0'
}, hostingServer => { hostName => 'ba-hbase25.jumbo.hq.eset.com', port => '16020',
startCode => '1604475904456' }, attempt => '179' } ] }

HBASE master UI->Table details

Shows region 'hds2_md5,BDFFEEF,1535957697205.f25fe93e24b34cb2f7fffddee1d89eec' as being on
region server ba-hbase18.jumbo.hq.com

So, is the table hosted on server hbase25 and being moved TO hbase18?

For some reason the table is not enabled at this time.

Table hds2_md5
Table Attributes
Attribute Name

Value

Description

Enabled

false

Is the table enabled

Compaction

NONE

Is the table compacting


The table has be online to perform these kinds of moves, yes?  A RIT is not going to occur
if the table is disabled, surely.

There was a network issue where net traffic went up on some paths as other paths went down.

So one question could be: was the table taken offline during this unassign - but then with
more than 30000 regions it is likely that other assign/unassigns were being carried out on
this and other tables.

Or was the table disabled with a  view to performing some fix on this RIT. (currently, data
'owners' are unavailable for comment).  Table has been offline for (at least) one day.

One of the techies stopped the regionserver instance on the hbase25 mode to try and force
some movement.

Thanks in advance.

Mime
View raw message