hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PRANEESH KUMAR <praneesh.san...@gmail.com>
Subject HConnection thread waiting on blocking queue indefinitely
Date Wed, 17 Jun 2015 12:26:05 GMT
Hi Ted,

Even in hbase 1.1.0, the client connection stalls when region split occurs.

Thread dump of master http://pastebin.com/rTFyuAqC

Thread dump of RS http://pastebin.com/Vf52Z1ni

Client waits in same BoundedCompletionService.take call

Regards,
Praneesh

On Thursday 11 June 2015, Ted Yu <yuzhihong@gmail.com> wrote:

> Looking at the revision history for ClientSmallReversedScanner.java which
> appeared in the stack trace, there have been several bug fixes on top of
> the hbase release you're using.
>
> Can you try hbase 1.1.0 to see if the problem can be reproduced (in cluster
> deployment) ?
>
> Thanks
>
> On Tue, Jun 9, 2015 at 11:42 PM, mukund murrali <mukundmurrali9@gmail.com>
> wrote:
>
> > Kindly look into this for full trace of RS.
> > http://pastebin.com/VS17vVd8
> >
> > Thanks
> >
> > On Wed, Jun 10, 2015 at 11:35 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Can you pastebin the complete stack trace for the region server ?
> > >
> > > Thanks
> > >
> > >
> > >
> > > > On Jun 9, 2015, at 10:52 PM, mukund murrali <
> mukundmurrali9@gmail.com>
> > > wrote:
> > > >
> > > > We are using HBase-1.0.0. Just before the client stalled, in RS there
> > > were
> > > > few handler threads that were blocked for  MVCC(thread stack below)
> > > check.
> > > > Not sure if it could cause a problem. I don't see anything unusual in
> > RS
> > > > threads. Also the same client can connect to regionserver after
> > restart.
> > > At
> > > > that instant what causing the problem is what we are confused.
> > > >
> > > >
> > > > java.lang.Thread.State: BLOCKED (on object monitor)
> > > >        at java.lang.Object.wait(Native Method)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > >        - locked <0x00000007ac0e0e88> (a java.util.LinkedList)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.completeMemstoreInsertWithSeqNum(MultiVersionConsistencyControl.java:127)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2822)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2476)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2430)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2434)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:640)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:604)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1832)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31313)
> > > >        at
> > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
> > > >        at
> > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > >        at
> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > >        at java.lang.Thread.run(Thread.java:745)
> > > >
> > > >
> > > >
> > > >
> > > >> On Tue, Jun 9, 2015 at 6:48 PM, Anoop John <anoop.hbase@gmail.com>
> > > wrote:
> > > >>
> > > >> Can you see at this time, what the threads at RS doing? Handlers
> > > mainly..
> > > >> which version oh hbase?
> > > >>
> > > >>> On Tuesday, June 9, 2015, mukund murrali <mukundmurrali9@gmail.com
> >
> > > wrote:
> > > >>> Hi
> > > >>>
> > > >>> I wrote a sample program with default client configurations and
> > > created a
> > > >>> single connection. I spawn client threads >
> > > hbase.hconnection.threads.max
> > > >>> from my client application and each thread insert data to hbase
> > > cluster.
> > > >>> Once a region split happens, all the hconnection threads(core
pool
> > and
> > > >> max
> > > >>> pool size were kept at 256) stalled at
> > BoundedCompletionService.take()
> > > >>> indefinitely. Even after the split completed it never resumed.
> > > >>>
> > > >>> So does it mean I have to create more instances of connection
> object
> > > for
> > > >> a
> > > >>> cluster in such scenarios (which is really not needed) ? There
was
> no
> > > >>> exception (I expected a RejectedExecution) also in client side.
So
> > > >> changing
> > > >>> the  hbase.hconnection.threads.max, hbase.hconnection.threads.core
> > can
> > > >>> create such problem?
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Sat, Jun 6, 2015 at 5:02 PM, ramkrishna vasudevan <
> > > >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >>>
> > > >>>> Not very sure on what could be the problem when the meta update
> > > >> happened.
> > > >>>> I would think that when the region split happened, there was
some
> > > issue
> > > >> on
> > > >>>> the meta update (as you said in the later mail). The splitted
> > regions
> > > >> would
> > > >>>> not have been updated properly in the META.  So any client
> > > updates/reads
> > > >>>> happening to this region would have stalled and hence your
client
> > > >>>> application also stalled.
> > > >>>>
> > > >>>> As I said the logs would be important here to know what happened.
> > > This
> > > >>>> could be one of a case and could be identified with the logs.
> > > >>>>
> > > >>>> Regards
> > > >>>> Ram
> > > >>>>
> > > >>>> On Sat, Jun 6, 2015 at 1:25 PM, mukund murrali <
> > > >> mukundmurrali9@gmail.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Sorry for misleading by specifying it as meta split. It
was meta
> > > >> update
> > > >>>>> during a user region split. This had caused the stallation
> > probably.
> > > >> We
> > > >>>>> have right now reverting client configs. Till now we didn't
face
> > the
> > > >>>> issue
> > > >>>>> again. Those changes causing some kindof exceptions or
timeout
> was
> > > >> what
> > > >>>> we
> > > >>>>> expected, but clients stalling indefinitely is what worrying
us.
> > > >>>>>
> > > >>>>> On Friday 5 June 2015, Vladimir Rodionov <vladrodionov@gmail.com
> >
> > > >> wrote:
> > > >>>>>
> > > >>>>>> I would suggest reverting client config changes back
to
> defaults.
> > At
> > > >>>>> least
> > > >>>>>> we will know if the issue is somehow related to client
config
> > > >> changes.
> > > >>>>>> On Jun 5, 2015 6:15 AM, "ramkrishna vasudevan" <
> > > >>>>>> ramkrishna.s.vasudevan@gmail.com <javascript:;>>
wrote:
> > > >>>>>>
> > > >>>>>>> Hbase:meta getting split? It may b some user region,
can u
> check
> > > >>>> that?
> > > >>>>> If
> > > >>>>>>> ur meta was splitting then there is something
wrong.
> > > >>>>>>> Can u attach the log snippets.
> > > >>>>>>>
> > > >>>>>>> Sent from phone. Excuse typos.
> > > >>>>>>> On Jun 5, 2015 6:00 PM, "mukund murrali" <
> > > >> mukundmurrali9@gmail.com
> > > >>>>>> <javascript:;>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi
> > > >>>>>>>>
> > > >>>>>>>> In our case there at that instance when the
client thread
> > > >> stalled,
> > > >>>>>> there
> > > >>>>>>>> was a hbase:meta region split happening. So
what went wrong?
> If
> > > >>>> there
> > > >>>>>> is
> > > >>>>>>> a
> > > >>>>>>>> split why should hconnection thread stall?
Since we changed
> the
> > > >>>>> client
> > > >>>>>>>> configuration caused this? I am once again
specifying our
> client
> > > >>>>>> related
> > > >>>>>>>> changes we did
> > > >>>>>>>>
> > > >>>>>>>> hbase.client.retries.number => 5
> > > >>>>>>>> zookeeper.recovery.retry => 0
> > > >>>>>>>> zookeeper.session.timeout => 1000
> > > >>>>>>>> zookeeper.recovery.retry.
> > > >>>>>>>> intervalmilli => 1
> > > >>>>>>>> hbase.rpc.timeout => 30000.
> > > >>>>>>>>
> > > >>>>>>>> Is zk timeout too low?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Jun 5, 2015 at 11:37 AM, ramkrishna
vasudevan <
> > > >>>>>>>> ramkrishna.s.vasudevan@gmail.com <javascript:;>>
wrote:
> > > >>>>>>>>
> > > >>>>>>>>> When you started  your client server was
the META table
> > > >> assigned.
> > > >>>>>> May
> > > >>>>>>> be
> > > >>>>>>>>> some thing happened around that time and
the client app was
> > > >> just
> > > >>>>>>> waiting
> > > >>>>>>>> on
> > > >>>>>>>>> the meta table to be assigned.  It would
have retried - Can
> > > >> you
> > > >>>>> check
> > > >>>>>>> the
> > > >>>>>>>>> logs.?
> > > >>>>>>>>>
> > > >>>>>>>>> So the best part here is the stand alone
client was able to
> be
> > > >>>>>>>> successful -
> > > >>>>>>>>> which means the new clients were able
to talk successfully
> > > >> with
> > > >>>> the
> > > >>>>>>>>> server.  And hence the restart of your
client has solved
> your
> > > >>>>>> problem.
> > > >>>>>>>> It
> > > >>>>>>>>> may be difficult to trouble shoot the
exact issue with the
> > > >>>> limited
> > > >>>>>>> info -
> > > >>>>>>>>> but see if your client app regularly gets
stalled and then it
> > > >> is
> > > >>>>>> better
> > > >>>>>>>> to
> > > >>>>>>>>> trouble shoot your app and the way it
accesses the server.
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Jun 5, 2015 at 11:21 AM, PRANEESH
KUMAR <
> > > >>>>>>>> praneesh.sankar@gmail.com <javascript:;>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> The client connection was in stalled
state. But there was
> > > >> only
> > > >>>>> one
> > > >>>>>>>>>> hconnection thread found in our thread
dump, which was
> > > >> waiting
> > > >>>>>>>>> indefinitely
> > > >>>>>>>>>> in BoundedCompletionService.take call.
Meanwhile we ran a
> > > >>>>>> standalone
> > > >>>>>>>> test
> > > >>>>>>>>>> program which was successful.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Once we restarted the client server,
the problem got
> > > >> resolved.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The basic doubt is, when the hconnection
thread stalled, why
> > > >>>> the
> > > >>>>>>> HBase
> > > >>>>>>>>>> client failed to create any more hconnections(max
pool size
> > > >> was
> > > >>>>>> 10).
> > > >>>>>>> In
> > > >>>>>>>>>> case of problem with table/meta regions
how come the test
> > > >>>> program
> > > >>>>>>>>>> succeeded.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Regards,
> > > >>>>>>>>>> Praneesh
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Fri, Jun 5, 2015 at 10:21 AM, ramkrishna
vasudevan <
> > > >>>>>>>>>> ramkrishna.s.vasudevan@gmail.com <javascript:;>>
wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Can you tell us more. Is your
client not working at all
> > > >> and
> > > >>>> it
> > > >>>>> is
> > > >>>>>>>>>> stalled ?
> > > >>>>>>>>>>> Are you seeing some results but
you find it slow than you
> > > >>>>>> expected?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> What type of workload are you
running?  All the tables are
> > > >>>>>> healthy?
> > > >>>>>>>>> Are
> > > >>>>>>>>>>> you able to read or write to them
individually using the
> > > >>>> hbase
> > > >>>>>>> shell?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Fri, Jun 5, 2015 at 10:18 AM,
PRANEESH KUMAR <
> > > >>>>>>>>>> praneesh.sankar@gmail.com <javascript:;>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi Ram,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> The cluster ran without any
problem for about 2 to 3
> > > >> days
> > > >>>>> with
> > > >>>>>>> low
> > > >>>>>>>>>> load,
> > > >>>>>>>>>>>> once we enabled it for high
load we immediately faced
> > > >> this
> > > >>>>>> issue.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>> Praneesh.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Thursday 4 June 2015, ramkrishna
vasudevan <
> > > >>>>>>>>>>>> ramkrishna.s.vasudevan@gmail.com
<javascript:;>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Is your cluster in working
condition.  Can you see if
> > > >> the
> > > >>>>>> META
> > > >>>>>>>> has
> > > >>>>>>>>>> been
> > > >>>>>>>>>>>>> assigned properly?  If
the META table is not
> > > >> initialized
> > > >>>>> and
> > > >>>>>>>> opened
> > > >>>>>>>>>>> then
> > > >>>>>>>>>>>>> your client thread will
hang.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Regards
> > > >>>>>>>>>>>>> Ram
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Thu, Jun 4, 2015 at
9:05 PM, PRANEESH KUMAR <
> > > >>>>>>>>>>>> praneesh.sankar@gmail.com
<javascript:;>
> > > >>>>>>>>>>>>> <javascript:;>>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> We are using Hbase-1.0.0.
We also facing the same
> > > >> issue
> > > >>>>>> that
> > > >>>>>>>>> client
> > > >>>>>>>>>>>>>> connection thread
is waiting at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200).
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Any help is appreciated.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>>>> Praneesh
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message