hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From charan kumar <charan.ku...@gmail.com>
Subject Re: Region Servers Crashing during Random Reads
Date Thu, 03 Feb 2011 20:28:14 GMT
HI Jonathan,

  Thanks for you quick reply..

Heap is set to 4G.

Following are the JVM opts.
export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:NewSize=6m
-XX:MaxNewSize=6m"

Are there any other options apart from increasing the RAM?

I am adding some more info about the app.

 > We are storing web page data in HBase.
 > Row key is Hashed URL, for random distribution, since we dont plan to do
scan's..
 > We have LZOCompression Set on this column family.
 > We were noticing 1500 Reads, when reading the page content.
 > We have a column family, which stores just metadata of the page "title"
etc... When reading this the performance is whopping 12000 TPS.

  We though the issue could be because of N/w bandwidth used between HBase
and Clients. So we disable LZO Compression on Column Family and started
doing the compression of the raw page on the client and decompress it when
readind (LZO).

 > With this my write performance jumped up from 2000 to 5000 at peak.
 > With this approach, the servers are crashing... Not sure , why only after
turning of LZO... and doing the same from client.



On Thu, Feb 3, 2011 at 12:13 PM, Jonathan Gray <jgray@fb.com> wrote:

> How much heap are you running on your RegionServers?
>
> 6GB of total RAM is on the low end.  For high throughput applications, I
> would recommend at least 6-8GB of heap (so 8+ GB of RAM).
>
> > -----Original Message-----
> > From: charan kumar [mailto:charan.kumar@gmail.com]
> > Sent: Thursday, February 03, 2011 11:47 AM
> > To: user@hbase.apache.org
> > Subject: Region Servers Crashing during Random Reads
> >
> > Hello,
> >
> >  I am using hbase 0.90.0 with hadoop-append. h/w ( Dell 1950, 2 CPU, 6 GB
> > RAM)
> >
> > I had 9 Region Servers crash (out of 30) in a span of 30 minutes during a
> heavy
> > reads. It looks like a GC, ZooKeeper Connection Timeout thingy to me.
> > I did all recommended configuration from the Hbase wiki... Any other
> > suggestions?
> >
> >
> > 2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew
> > (promotion
> > failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660:
> > [CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark:
> > 12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31 secs]
> >
> > 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew
> > (promotion
> > failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224:
> > [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark:
> > 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs]
> >
> > 2011-02-03T09:43:07.890-0800: 70693.632: [GC 70693.632: [ParNew
> > (promotion
> > failed): 5555K->5540K(5568K), 0.0280950 secs]70693.660:
> > [CMS2011-02-03T09:43:16.864-0800: 70702.606: [CMS-concurrent-mark:
> > 12.549/69.323 secs] [Times: user=11.90 sys=1.26, real=69.31 secs]
> >
> >
> > The following is the log entry in region Server
> >
> > 2011-02-03 10:37:43,946 INFO org.apache.zookeeper.ClientCnxn: Client
> > session timed out, have not heard from server in 47172ms for sessionid
> > 0x12db9f722421ce3, closing socket connection and attempting reconnect
> > 2011-02-03 10:37:43,947 INFO org.apache.zookeeper.ClientCnxn: Client
> > session timed out, have not heard from server in 48159ms for sessionid
> > 0x22db9f722501d93, closing socket connection and attempting reconnect
> > 2011-02-03 10:37:44,401 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server XXXXXXXXXXXXXXXX
> > 2011-02-03 10:37:44,402 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to XXXXXXXXX, initiating session
> > 2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server XXXXXXXXXXXXXXX
> > 2011-02-03 10:37:44,709 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to XXXXXXXXXXXXXXXXXXXXX, initiating session
> > 2011-02-03 10:37:44,767 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > started; Attempting to free 81.93 MB of total=696.25 MB
> > 2011-02-03 10:37:44,784 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU eviction
> > completed; freed=81.94 MB, total=614.81 MB, single=379.98 MB,
> > multi=309.77 MB, memory=0 KB
> > 2011-02-03 10:37:45,205 INFO org.apache.zookeeper.ClientCnxn: Unable to
> > reconnect to ZooKeeper service, session 0x22db9f722501d93 has expired,
> > closing socket connection
> > 2011-02-03 10:37:45,206 INFO
> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem
> > entation:
> > This client just lost it's session with ZooKeeper, trying to reconnect.
> > 2011-02-03 10:37:45,453 INFO
> > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplem
> > entation:
> > Trying to reconnect to zookeeper
> > 2011-02-03 10:37:45,206 INFO org.apache.zookeeper.ClientCnxn: Unable to
> > reconnect to ZooKeeper service, session 0x12db9f722421ce3 has expired,
> > closing socket connection
> > gionserver:60020-0x22db9f722501d93 regionserver:60020-
> > 0x22db9f722501d93
> > received expired from ZooKeeper, aborting
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired
> >         at
> > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(
> > ZooKeeperWatcher.java:328)
> >         at
> > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeep
> > erWatcher.java:246)
> >         at
> > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja
> > va:530)
> >         at
> > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> > handled exception: org.apache.hadoop.hbase.YouAreDeadException: Server
> > REPORT rejected; currently processing XXXXXXXXXXXX,60020,1296684296172
> > as dead server
> > org.apache.hadoop.hbase.YouAreDeadException:
> > org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > currently processing XXXXXXXXXXXX,60020,1296684296172 as dead server
> >         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> >         at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructor
> > AccessorImpl.java:39)
> >         at
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon
> > structorAccessorImpl.java:27)
> >         at
> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> >         at
> > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExce
> > ption.java:96)
> >         at
> > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(Remote
> > Exception.java:80)
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerRep
> > ort(HRegionServer.java:729)
> >         at
> > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.j
> > ava:586)
> >         at java.lang.Thread.run(Thread.java:619)
> >
> >
> > 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew
> > (promotion
> > failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224:
> > [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark:
> > 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs]
> >
> >
> >
> > Thanks,
> > Charan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message