hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Mogenet <adrien.moge...@gmail.com>
Subject Re: High Full GC count for Region server
Date Thu, 31 Oct 2013 10:20:46 GMT
The "responseTooSlow" message is triggered whenever a bunch of operations
is taking more than a configured amount of time. In your case, processing
15827 elements can lead into long response time, so no worry about this.

However, your SocketTimeoutException might be due to long GC pauses. I
guess it might also be due to network failures or RS contention (too many
requests on this RS, no more IPC slot...)


On Thu, Oct 31, 2013 at 9:52 AM, Vimal Jain <vkjk89@gmail.com> wrote:

> Hi,
> Can anyone please reply to the above query ?
>
>
> On Tue, Oct 29, 2013 at 10:48 AM, Vimal Jain <vkjk89@gmail.com> wrote:
>
> > Hi,
> > Here is my analysis of this problem.Please correct me if i wrong
> somewhere.
> > I have assigned 2 GB to region server process.I think its sufficient
> > enough to handle around 9GB of data.
> > I have not changed much of the parameters , especially memstore size
> which
> > is 128 GB for 0.94.7 by default.
> > Also as per my understanding , each col-family has one memstore
> associated
> > with it.So my memstores are taking 128*3 = 384 MB ( I have 3 column
> > families).
> > So i think i should reduce memstore size to something like 32/64 MB so
> > that data is flushed to disk at higher frequency then current
> > frequency.This will save some memory.
> > Is there any other parameter other then memstore size which affects
> memory
> > utilization.
> >
> > Also I am getting below exceptions in data node log and region server log
> > every day.Is it due to long GC pauses ?
> >
> > Data node logs :-
> >
> > hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 WARN
> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> > 192.168.20.30:5001
> > 0, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075, ipcPort=50020):Got exception while serving
> > blk_-560908881317618221_58058
> >  to /192.168.20.30:
> > hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> > millis timeout while waiting for channel to be ready for write. ch :
> > java.nio
> > .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> > 192.168.20.30:39413]
> > hadoop-hadoop-datanode-woody.log:2013-10-29 00:12:13,127 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> > 192.168.20.30:500
> >
> > 10, storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> > infoPort=50075, ipcPort=50020):DataXceiver
> > hadoop-hadoop-datanode-woody.log:java.net.SocketTimeoutException: 480000
> > millis timeout while waiting for channel to be ready for write. ch :
> > java.nio
> > .channels.SocketChannel[connected local=/192.168.20.30:50010 remote=/
> > 192.168.20.30:39413]
> >
> >
> > Region server logs :-
> >
> > hbase-hadoop-regionserver-woody.log:2013-10-29 01:01:16,475 WARN
> > org.apache.hadoop.ipc.HBaseServer: (responseTooSlow):
> > {"processingtimems":15827,"call
> > ":"multi(org.apache.hadoop.hbase.client.MultiAction@2918e464), rpc
> > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"192.168.20.
> >
> >
> 31:50619","starttimems":1382988660645,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > hbase-hadoop-regionserver-woody.log:2013-10-29 06:01:27,459 WARN
> > org.apache.hadoop.ipc.HBaseServer: (operationTooSlow):
> > {"processingtimems":14745,"cli
> > ent":"192.168.20.31:50908
> >
> ","timeRange":[0,9223372036854775807],"starttimems":1383006672707,"responsesize":55,"class":"HRegionServer","table":"event_da
> >
> >
> ta","cacheBlocks":true,"families":{"oinfo":["clubStatus"]},"row":"1752869","queuetimems":1,"method":"get","totalColumns":1,"maxVersions":1}
> >
> >
> >
> >
> >
> > On Mon, Oct 28, 2013 at 11:55 PM, Asaf Mesika <asaf.mesika@gmail.com
> >wrote:
> >
> >> Check through HDFS UI that your cluster haven't reached maximum disk
> >> capacity
> >>
> >> On Thursday, October 24, 2013, Vimal Jain wrote:
> >>
> >> > Hi Ted/Jean,
> >> > Can you please help here ?
> >> >
> >> >
> >> > On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain <vkjk89@gmail.com
> >> <javascript:;>>
> >> > wrote:
> >> >
> >> > > Hi Ted,
> >> > > Yes i checked namenode and datanode logs and i found below
> exceptions
> >> in
> >> > > both the logs:-
> >> > >
> >> > > Name node :-
> >> > > java.io.IOException: File
> >> > >
> >> >
> >>
> /hbase/event_data/433b61f2a4ebff8f2e4b89890508a3b7/.tmp/99797a61a8f7471cb6df8f7b95f18e9e
> >> > > could only be replicated to 0 nodes, instead of 1
> >> > >
> >> > > java.io.IOException: Got blockReceived message from unregistered or
> >> dead
> >> > > node blk_-2949905629769882833_52274
> >> > >
> >> > > Data node :-
> >> > > 480000 millis timeout while waiting for channel to be ready for
> >> write. ch
> >> > > : java.nio.channels.SocketChannel[connected local=/
> >> 192.168.20.30:50010
> >> > >  remote=/192.168.20.30:36188]
> >> > >
> >> > > ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> > > DatanodeRegistration(192.168.20.30:50010,
> >> > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237,
> >> > infoPort=50075,
> >> > > ipcPort=50020):DataXceiver
> >> > >
> >> > > java.io.EOFException: while trying to read 39309 bytes
> >> > >
> >> > >
> >> > > On Tue, Oct 22, 2013 at 10:19 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> >> > >
> >> > >> bq. java.io.IOException: File /hbase/event_data/
> >> > >>
> >> 4c3765c51911d6c67037a983d205a010/.tmp/bfaf8df33d5b4068825e3664d3e4b2b0
> >> > >> could
> >> > >> only be replicated to 0 nodes, instead of 1
> >> > >>
> >> > >> Have you checked Namenode / Datanode logs ?
> >> > >> Looks like hdfs was not stable.
> >> > >>
> >> > >>
> >> > >> On Tue, Oct 22, 2013 at 9:01 AM, Vimal Jain <vkjk89@gmail.com>
> >> wrote:
> >> > >>
> >> > >> > HI Jean,
> >> > >> > Thanks for your reply.
> >> > >> > I have total 8 GB memory and distribution is as follows:-
> >> > >> >
> >> > >> > Region server  - 2 GB
> >> > >> > Master,Namenode,Datanode,Secondary Namenode,Zookepeer - 1
GB
> >> > >> > OS - 1 GB
> >> > >> >
> >> > >> > Please let me know if you need more information.
> >> > >> >
> >> > >> >
> >> > >> > On Tue, Oct 22, 2013 at 8:15 PM, Jean-Marc Spaggiari <
> >> > >> > jean-marc@spaggiari.org> wrote:
> >> > >> >
> >> > >> > > Hi Vimal,
> >> > >> > >
> >> > >> > > What are your settings? Memory of the host, and memory
> allocated
> >> for
> >> > >> the
> >> > >> > > different HBase services?
> >> > >> > >
> >> > >> > > Thanks,
> >> > >> > >
> >> > >> > > JM
> >> > >> > >
> >> > >> > >
> >> > >> > > 2013/10/22 Vimal Jain <vkjk89@gmail.com>
> >> > >> > >
> >> > >> > > > Hi,
> >> > >> > > > I am running in Hbase in pseudo distributed mode.
( Hadoop
> >> > version -
> >> > >> > > 1.1.2
> >> > >> > > > , Hbase version - 0.94.7 )
> >> > >> > > > I am getting few exceptions in both hadoop ( namenode
,
> >> datanode)
> >> > >> logs
> >> > >> > > and
> >> > >> > > > hbase(region server).
> >> > >> > > > When i search for these exceptions on google ,
i concluded
> >>  that
> >> > >> > problem
> >> > >> > > is
> >> > >> > > > mainly due to large number of full GC in region
server
> process.
> >> > >> > > >
> >> > >> > > > I used jstat and found that there are total of
950 full GCs
> in
> >> > span
> >> > >> of
> >> > >> > 4
> >> > >> > > > days for region server process.Is this ok?
> >> > >> > > >
> >> > >> > > > I am totally confused by number of exceptions i
am getting.
> >> > >> > > > Also i get below exceptions intermittently.
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > Region server:-
> >> > >> > > >
> >> > >> > > > 2013-10-22 12:00:26,627 WARN
> org.apache.hadoop.ipc.HBaseServer:
> >> > >> > > > (responseTooSlow):
> >> > >> > > > {"processingtimems":15312,"call":"next(-6681408251916104762,
> >> > 1000),
> >> > >> rpc
> >> > >> > > > version=1, client version=29,
> >> > >> > methodsFingerPrint=-1368823753","client":"
> >> > >> > > > 192.168.20.31:48270
> >> > >> > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> ","starttimems":1382423411293,"queuetimems":0,"class":"HRegionServer","responsesize":4808556,"method":"next"}
> >> > >> > > > 2013-10-22 12:06:17,606 WARN
> org.apache.hadoop.ipc.HBaseServer:
> >> > >> > > > (operationTooSlow): {"processingtimems":14759,"client":"
> >> > >> > > > 192.168.20.31:48247
> >> > >> > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> ","timeRange":[0,9223372036854775807],"starttimems":1382423762845,"responsesize":61,"class":"HRegionServer","table":"event_data","cacheBlocks":true,"families":{"gin
> >>
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Adrien Mogenet
http://www.borntosegfault.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message