hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Hoppins <marc.hopp...@eset.sk>
Subject RE: Region server idle
Date Mon, 11 Jan 2021 08:51:58 GMT
I tried. Appears to have failed reading data from hbase:meta. These are repeated errors for
the whole run of list_quotas.

A balance task was run on Friday. It took 9+ hours. The affected host had 6 regions - no procedures/locks
or processes were running for those 6 regions. Today, that host has 8 regions.  No real work
being performed on them.  The other server - which went idle as a result of removing hbase19
host from hbase and re-inserting to hbase - is still doing nothing and has no regions assigned.

I was su - hbase hbase shell to run it.

****************

HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.1.0-cdh6.3.2, rUnknown, Fri Nov  8 05:44:07 PST 2019
Took 0.0011 seconds
hbase(main):001:0> list_quotas
OWNER                                      QUOTAS
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=8, exceptions:
Mon Jan 11 09:16:46 CET 2021, RpcRetryingCaller{globalStartTime=1610353006298, pause=100,
maxAttempts=8}, javax.security.sasl.SaslException: Call to dr1-hbase18.jumb              
                                                                                         
                                    o.hq.com/10.1.140.36:16020 failed on local exception:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials
provi                                                                                    
                                                        ded (Mechanism level: Failed to find
any Kerberos tgt)] [Caused by javax.security.sasl.SaslException: GSS initiate failed [Caused
by GSSException: No valid credentia                                                      
                                                                                      ls provided
(Mechanism level: Failed to find any Kerberos tgt)]]
Mon Jan 11 09:16:46 CET 2021, RpcRetryingCaller{globalStartTime=1610353006298, pause=100,
maxAttempts=8}, java.io.IOException: Call to dr1-hbase18.jumbo.hq.com/                   
                                                                                         
                               10.1.140.36:16020 failed on local exception: java.io.IOException:
Can not send request because relogin is in progress.
Mon Jan 11 09:16:46 CET 2021, RpcRetryingCaller{globalStartTime=1610353006298, pause=100,
maxAttempts=8}, java.io.IOException: Call to dr1-hbase18.jumbo.hq.com/                   
                                                                                         
                               10.1.140.36:16020 failed on local exception: java.io.IOException:
Can not send request because relogin is in progress.
Mon Jan 11 09:16:47 CET 2021, RpcRetryingCaller{globalStartTime=1610353006298, pause=100,
maxAttempts=8}, java.io.IOException: Call to dr1-hbase18.jumbo.hq.com/                   
                                                                                         
                               10.1.140.36:16020 failed on local exception: java.io.IOException:
Can not send request because relogin is in progress.
Mon Jan 11 09:16:47 CET 2021, RpcRetryingCaller{globalStartTime=1610353006298, pause=100,
maxAttempts=8}, java.io.IOException: Call to dr1-hbase18.jumbo.hq.com/                   
                                                                                         
                               10.1.140.36:16020 failed on local exception: java.io.IOException:
Can not send request because relogin is in progress.
Mon Jan 11 09:16:48 CET 2021, RpcRetryingCaller{globalStartTime=1610353006298, pause=100,
maxAttempts=8}, javax.security.sasl.SaslException: Call to dr1-hbase18.jumb              
                                                                                         
                                    o.hq.com/10.1.140.36:16020 failed on local exception:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials
provi                                                                                    
                                                        ded (Mechanism level: Failed to find
any Kerberos tgt)] [Caused by javax.security.sasl.SaslException: GSS initiate failed [Caused
by GSSException: No valid credentia                                                      
                                                                                      ls provided
(Mechanism level: Failed to find any Kerberos tgt)]]
Mon Jan 11 09:16:50 CET 2021, RpcRetryingCaller{globalStartTime=1610353006298, pause=100,
maxAttempts=8}, java.io.IOException: Call to dr1-hbase18.jumbo.hq.com/                   
                                                                                         
                               10.1.140.36:16020 failed on local exception: java.io.IOException:
Can not send request because relogin is in progress.
Mon Jan 11 09:16:54 CET 2021, RpcRetryingCaller{globalStartTime=1610353006298, pause=100,
maxAttempts=8}, javax.security.sasl.SaslException: Call to dr1-hbase18.jumb              
                                                                                         
                                    o.hq.com/10.1.140.36:16020 failed on local exception:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials
provi                                                                                    
                                                        ded (Mechanism level: Failed to find
any Kerberos tgt)] [Caused by javax.security.sasl.SaslException: GSS initiate failed [Caused
by GSSException: No valid credentia                                                      
                                                                                      ls provided
(Mechanism level: Failed to find any Kerberos tgt)]]

-----Original Message-----
From: Stack <stack@duboce.net> 
Sent: Saturday, January 9, 2021 1:52 AM
To: Hbase-User <user@hbase.apache.org>
Subject: Re: Region server idle

EXTERNAL

Looking at code around exception, can you check your quota settings? See refguide on how to
list quotas. Look for table or namespace that is empty or non-existant and fill in missing
portion.

This is master-side log? It is from a periodic task so perhaps something else is in the way
of the non-assign? Anything else in there about balancing or why we are skipping assign to
these servers? Try a balance run in the shell and then check master log to see why no work
done?

S

On Fri, Jan 8, 2021 at 2:51 AM Marc Hoppins <marc.hoppins@eset.sk> wrote:

> Apologies again.  Here is the full error message.
>
> 2021-01-08 11:34:15,831 ERROR org.apache.hadoop.hbase.ScheduledChore:
> Caught error
> java.lang.IllegalStateException: Expected only one of namespace and 
> tablename to be null
>         at
> org.apache.hadoop.hbase.quotas.SnapshotQuotaObserverChore.getSnapshotsToComputeSize(SnapshotQuotaObserverChore.java:198)
>         at
> org.apache.hadoop.hbase.quotas.SnapshotQuotaObserverChore._chore(SnapshotQuotaObserverChore.java:126)
>         at
> org.apache.hadoop.hbase.quotas.SnapshotQuotaObserverChore.chore(SnapshotQuotaObserverChore.java:113)
>         at
> org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:186)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> -----Original Message-----
> From: Marc Hoppins <marc.hoppins@eset.sk>
> Sent: Friday, January 8, 2021 10:57 AM
> To: user@hbase.apache.org
> Subject: RE: Region server idle
>
> EXTERNAL
>
> So, I tried decommission that RS and recommission it.  No change. 
> Server still idle.
>
> Tried decommission another server and see if HBASE sets itself right. 
> Now I have two RS that are idle.
>
> ba-hbase18.jumbo.hq.com,16020,1604413480001     Tue Nov 03 15:24:40 CET
> 2020    1 s     2.1.0-cdh6.3.2  13      471
> ba-hbase19.jumbo.hq.com,16020,1610095488001     Fri Jan 08 09:44:48 CET
> 2021    0 s     2.1.0-cdh6.3.2  0       6
> ba-hbase20.jumbo.hq.com,16020,1610096850259     Fri Jan 08 10:07:30 CET
> 2021    0 s     2.1.0-cdh6.3.2  0       0
> ba-hbase21.jumbo.hq.com,16020,1604414101652     Tue Nov 03 15:35:01 CET
> 2020    1 s     2.1.0-cdh6.3.2  15      447
>
> From the logs:
> 2021-01-08 10:25:36,875 ERROR org.apache.hadoop.hbase.ScheduledChore:
> Caught error java.lang.IllegalStateException: Expected only one of 
> namespace and tablename to be null
>
> This is reappearing in hbase master log
>
> M
>
> -----Original Message-----
> From: Sean Busbey <busbey@apache.org>
> Sent: Thursday, January 7, 2021 7:30 PM
> To: Hbase-User <user@hbase.apache.org>
> Subject: Re: Region server idle
>
> EXTERNAL
>
> Sounds like https://issues.apache.org/jira/browse/HBASE-24139
>
> The description of that jira has a workaround.
>
> On Thu, Jan 7, 2021, 05:23 Marc Hoppins <marc.hoppins@eset.sk> wrote:
>
> > Hi all,
> >
> > I have a setup with 67 region servers. 29 Dec one system had to be 
> > shut down to have EMM module swapped out - which took one work day.
> > Host was back online 30 Dec.
> >
> > My HBASE is very basic so I appreciate your patience.
> >
> > My understanding of the defaults that are setup is that a major 
> > compaction should occur every 7 days.  Moreover, do I assume that 
> > more extensive balancing may occur after this happens?
> >
> > When I check (via hbase master UI) the status of HBASE, I see the
> > following:
> >
> > ServerName
> >
> > Start time
> >
> > Last contact
> >
> > Version
> >
> > Requests Per Second
> >
> > Num. Regions
> >
> > ba-hbase16.jumbo.hq.com,16020,1604413068640<
> > http://ba-hbase16.jumbo.hq.eset.com:16030/rs-status>
> >
> > Tue Nov 03 15:17:48 CET 2020
> >
> > 3 s
> >
> > 2.1.0-cdh6.3.2
> >
> > 46
> >
> > 462
> >
> > ba-hbase17.jumbo.hq.com,16020,1604413274393<
> > http://ba-hbase17.jumbo.hq.eset.com:16030/rs-status>
> >
> > Tue Nov 03 15:21:14 CET 2020
> >
> > 1 s
> >
> > 2.1.0-cdh6.3.2
> >
> > 19
> >
> > 462
> >
> > ba-hbase18.jumbo.hq.com,16020,1604413480001<
> > http://ba-hbase18.jumbo.hq.eset.com:16030/rs-status>
> >
> > Tue Nov 03 15:24:40 CET 2020
> >
> > 2 s
> >
> > 2.1.0-cdh6.3.2
> >
> > 62
> >
> > 461
> >
> > ba-hbase19.jumbo.hq.com,16020,1609326754985<
> > http://ba-hbase19.jumbo.hq.eset.com:16030/rs-status>
> >
> > Wed Dec 30 12:12:34 CET 2020
> >
> > 2 s
> >
> > 2.1.0-cdh6.3.2
> >
> > 0
> >
> > 0
> >
> > ba-hbase20.jumbo.hq.com,16020,1604413895967<
> > http://ba-hbase20.jumbo.hq.eset.com:16030/rs-status>
> >
> > Tue Nov 03 15:31:35 CET 2020
> >
> > 2 s
> >
> > 2.1.0-cdh6.3.2
> >
> > 62
> >
> > 503
> >
> > ba-hbase21.jumbo.hq.com,16020,1604414101652<
> > http://ba-hbase21.jumbo.hq.eset.com:16030/rs-status>
> >
> > Tue Nov 03 15:35:01 CET 2020
> >
> > 3 s
> >
> > 2.1.0-cdh6.3.2
> >
> > 59
> >
> > 442
> >
> > ba-hbase22.jumbo.hq.com,16020,1604414308289<
> > http://ba-hbase22.jumbo.hq.eset.com:16030/rs-status>
> >
> > Tue Nov 03 15:38:28 CET 2020
> >
> > 0 s
> >
> > 2.1.0-cdh6.3.2
> >
> > 40
> >
> > 438
> >
> >
> > Why, after more than 7 days, is this host not hosting more (any) regions?
> > Should I initiate some kind of rebalancing?
> >
> > Thanks in advance.
> >
> > M
> >
>
Mime
View raw message