hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "who.cat" <who....@qq.com>
Subject Re: HBase resgionServer crashed with no gc detected
Date Thu, 20 Oct 2016 01:01:08 GMT
i've upload the file to git hub ,and the url is :https://github.com/eswidy/waterspider/blob/master/regionServer.log

thanks so much.




------------------ Original ------------------
From:  "Ted Yu";<yuzhihong@gmail.com>;
Date:  Oct 19, 2016
To:  "user@hbase.apache.org"<user@hbase.apache.org>; 

Subject:  Re: HBase resgionServer crashed with no gc detected



The log file was not delivered by the mailing list.

Consider using pastebin or third party site.

On Tue, Oct 18, 2016 at 10:38 PM, who.cat <who.cat@qq.com> wrote:

> thanks fyi.Yes,i did not turn the debug and try it now .I also doubt the
> heavy cpu load  caused ,then checked cpu highest  Utilization is 60%(Cpu
> user )
> My region server  gc parameter is :export SERVER_GC_OPTS="-verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:{{log_dir}}/gc.log-`date
> +'%Y%m%d%H%M'`"
> The 10/12 log was rolled .i  got the same crash log yesterday(10/18).
> Details in the attachment 'regionServer.log', and the JVM pause at
> "2016-10-17 18:44:07,232" in line 82 .
> Thanks so much.
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Ted Yu";<yuzhihong@gmail.com>;
> *发送时间:* 2016年10月19日(星期三) 中午11:17
> *收件人:* "user@hbase.apache.org"<user@hbase.apache.org>;
> *主题:* Re: HBase resgionServer crashed with no gc detected
>
> Can you show more of the region server log prior to 23:48:13 (including the
> pause) ?
>
> Was the region server under heavy load during the pause ?
>
> Consider turning on DEBUG logging if you haven't.
>
> Please also share GC parameters.
>
> Thanks
>
> On Tue, Oct 18, 2016 at 7:58 PM, who.cat <who.cat@qq.com> wrote:
>
> > Hi all:
> > I've a  HDP big data cluster with 4 nodes and create by Ambari  the HBase
> > is        1.1.2.
> > As running YCSB for benchmark the RegionServer instance or the Hmaster
> > instance crashes which it's logs shows:
> >
> > ---------------------log start ---------------------
> > 2016-10-12 23:48:13,591 INFO  [main-SendThread(Node1:2181)]
> > zookeeper.ClientCnxn: Unable to read additional data from server
> sessionid
> > 0x157b7f5f0bc0005, likely server has closed socket, closing socket
> > connection and attempting reconnect
> > 2016-10-12 23:48:13,595 INFO  [HBase-Metrics2-1] impl.MetricsSinkAdapter:
> > Sink timeline started
> > 2016-10-12 23:48:13,606 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl:
> > Scheduled snapshot period at 10 second(s).
> > 2016-10-12 23:48:13,606 INFO  [HBase-Metrics2-1] impl.MetricsSystemImpl:
> > HBase metrics system started
> > 2016-10-12 23:48:14,496 INFO  [main-SendThread(Node4:2181)]
> > zookeeper.ClientCnxn: Opening socket connection to server Node4/
> > 1.1.6.104:2181. Will not attempt to authenticate using SASL (unknown
> > error)
> > 2016-10-12 23:48:14,506 INFO  [main-SendThread(Node4:2181)]
> > zookeeper.ClientCnxn: Socket connection established to Node4/
> > 1.17.6.104:2181, initiating session
> > 2016-10-12 23:48:14,517 INFO  [main-SendThread(Node4:2181)]
> > zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session
> > 0x157b7f5f0bc0005 has expired, closing socket connection
> > 2016-10-12 23:48:14,517 FATAL [main-EventThread]
> > regionserver.HRegionServer: ABORTING region server
> > node1,16020,1476260847716: regionserver:16020-0x157b7f5f0bc0005,
> > quorum=node2:2181,node1:2181,node4:2181, baseZNode=/hbase-unsecure
> > regionserver:16020-0x157b7f5f0bc0005 received expired from ZooKeeper,
> > aborting
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired
> >         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.
> > connectionEvent(ZooKeeperWatcher.java:585)
> >         at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.
> > process(ZooKeeperWatcher.java:517)
> >         at org.apache.zookeeper.ClientCnxn$EventThread.
> > processEvent(ClientCnxn.java:534)
> >         at org.apache.zookeeper.ClientCnxn$EventThread.run(
> > ClientCnxn.java:510)
> > 2016-10-12 23:48:14,518 FATAL [main-EventThread]
> > regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
> > [org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint]
> > ---------------------log end---------------------
> >
> > After checked the log ,it shows  that the region server jvm paused a long
> > time and the zkclient cannot send heartbeats, the session times out Which
> > the 'reference guide' had descripted http://hbase.apache.org/book.
> > html#trouble.rs.runtime.zkexpired  .So a read the log detail and to find
> > the  java GC event  but there's no  full gc occurred.
> > And more a found the same symptom in the  DataNode instance .
> >
> > The node os is Centos7 maybe the  kernel  futex bug  ,after checking the
> > bug was fixed in my OS .
> >  There's any other factor caused the problem except java GC?
> > Anyone who got the same problem ? Any ideas ?
> > Thank you .
>
>
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message