hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhenyu Zhong <zhongresea...@gmail.com>
Subject Re: regarding to HBase 1316 ZooKeeper: use native threads to avoid GC stalls (JNI integration)
Date Thu, 29 Oct 2009 21:23:57 GMT
I have 19 quorum members now.

When I did test on loading data to two columnfamilies of one table in HBase
using two seperate MR jobs, I lost my regionserver and the test failed.

Does HBase allow such table update operation?

The errors I got while I lost my regionserver is:
2009-10-29 21:09:34,705 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll
/hbase/.logs/YYYY,60021,1256849619429/hlog.d
at.1256849620029, entries=271911, calcsize=63754142, filesize=33975611. New
hlog /hbase/.logs/YYYY,60021,1256849619429/hl
og.dat.1256850574705
2009-10-29 21:09:50,322 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571)
        at java.lang.Thread.run(Thread.java:619)
2009-10-29 21:09:50,773 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x1124a2128bcf0001 to sun.nio.ch.SelectionKeyImpl@663
257b8
java.io.IOException: TIMED OUT
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
2009-10-29 21:09:50,873 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Disconnected, type: None, path:
null
2009-10-29 21:09:51,423 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server YYYY:2181
2009-10-29 21:09:51,423 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/192.168.
100.118:54789 remote=superpyxis0005/192.168.100.119:2181]
2009-10-29 21:09:51,423 INFO org.apache.zookeeper.ClientCnxn: Server
connection successful
2009-10-29 21:09:51,423 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Expired, type: None, path: null
2009-10-29 21:09:51,423 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x1124a2128bcf0001 to sun.nio.ch.SelectionKeyImpl@182
9ae5e
java.io.IOException: Session Expired
        at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
        at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2009-10-29 21:09:51,423 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session
expired
2009-10-29 21:09:51,423 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=254.97333, regions=36, stores=119,
storefiles=131, storefileIndexSize=8, memstoreSize=39, usedHeap=85,
maxHeap=4079, blockCacheSize=7019112, blockCacheFree=848487832, blockCach
eCount=0, blockCacheHitRatio=0
2009-10-29 21:09:53,327 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60021





On Thu, Oct 29, 2009 at 2:51 PM, stack <stack@duboce.net> wrote:

> On Thu, Oct 29, 2009 at 11:46 AM, Zhenyu Zhong <zhongresearch@gmail.com
> >wrote:
>
> > FYI
> > It looks like increasing the number of Zookeeper Quorums can solve the
> > following error message : org.apache.hadoop.hbase.
> > client.NoServerForRegionException: Timed out trying to locate root region
> > at
> > org.apache.hadoop.hbase.
> >
> > You mean quorum members?  How many do you have now?
>
>
>
> > Now I am running Zookeeper quorum on each node I have.
> > However, I am still having issues about losing regionserver.
> >
> > Whats in the logs?
>
>
>
>
> > Is there a way to browse the Znode in zookeeper?
> >
> >
> Type 'zk' in the hbase shell.
>
> You can get to the zk shell from hbase shell.  You so things like:
>
> > zk "ls /"
>
> (Yes, quotes needed).
>
> St.Ack
>
>
>
> > thanks
> > zhenyu
> >
> >
> >
> >
> >
> >
> > On Wed, Oct 28, 2009 at 3:40 PM, Zhenyu Zhong <zhongresearch@gmail.com
> > >wrote:
> >
> > > JG,
> > >
> > >
> > > Thanks a lot for the tips.
> > > I set the HEAP to 4GB and GC options as -XX:ParallelGCThreads=8
> > >  -XX:+UseConcMarkSweepGC.
> > >
> > > I checked the logs in my Master an RS and found the following errors.
> > > Basically, master got exception error while scanning ROOT, then the
> ROOT
> > > region was offline and unset.  Thus the regionserver can't get
> > > NotservingRegion errors.
> > >
> > > In the master:
> > > 2009-10-28 19:00:30,591 INFO
> org.apache.hadoop.hbase.master.BaseScanner:
> > > RegionManager.rootScanner scanning meta region {server: x.x.x.
> > > x:60021, regionname: -ROOT-,,0, startKey: <>}
> > > 2009-10-28 19:00:30,591 WARN
> org.apache.hadoop.hbase.master.BaseScanner:
> > > Scan ROOT region
> > > java.io.IOException: Call to /x.x.x.x:60021 failed on local exception:
> > > java.io.EOFException
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:757)
> > >         at
> > > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:727)
> > >         at
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
> > >         at $Proxy1.openScanner(Unknown Source)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160)
> > >         at
> > >
> org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:54)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:79)
> > >         at
> > > org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
> > >         at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
> > > Caused by: java.io.EOFException
> > >         at java.io.DataInputStream.readInt(DataInputStream.java:375)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:504)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:448)
> > > 2009-10-28 19:00:30,591 INFO
> org.apache.hadoop.hbase.master.BaseScanner:
> > > RegionManager.metaScanner scanning meta region {server: x.x.x.
> > > x:60021, regionname: .META.,,1, startKey: <>}
> > > 2009-10-28 19:00:30,591 WARN
> org.apache.hadoop.hbase.master.BaseScanner:
> > > Scan one META region: {server: x.x.x.x:60021, regionname: .M
> > > ETA.,,1, startKey: <>}
> > > java.net.ConnectException: Connection refused
> > >         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> > >         at
> > > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> > >         at
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> > >         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:308)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:831)
> > >         at
> > > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:712)
> > >         at
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
> > >         at $Proxy1.openScanner(Unknown Source)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:160)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> > >         at
> > > org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:136)
> > >         at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
> > > 2009-10-28 19:00:30,591 INFO
> org.apache.hadoop.hbase.master.BaseScanner:
> > > All 1 .META. region(s) scanned
> > > 2009-10-28 19:00:31,395 INFO
> > org.apache.hadoop.hbase.master.ServerManager:
> > > Removing server's info YYYY,60021,125675547057
> > > 0
> > > 2009-10-28 19:00:31,395 INFO
> > org.apache.hadoop.hbase.master.RegionManager:
> > > Offlined ROOT server: x.x.x.x:60021
> > >
> > > 2009-10-28 19:00:31,395 INFO
> > org.apache.hadoop.hbase.master.RegionManager:
> > > -ROOT- region unset (but not set to be reassigned)
> > > 2009-10-28 19:00:31,395 INFO
> > org.apache.hadoop.hbase.master.RegionManager:
> > > ROOT inserted into regionsInTransition
> > > 2009-10-28 19:00:31,395 INFO
> > org.apache.hadoop.hbase.master.RegionManager:
> > > Offlining META region: {server: x.x.x.x:60021, regionname: .META.,,1,
> > > startKey: <>}
> > > 2009-10-28 19:00:31,395 INFO
> > org.apache.hadoop.hbase.master.RegionManager:
> > > META region removed from onlineMetaRegions
> > >
> > >
> > >
> > > On the regionserver:
> > > 2009-10-28 18:51:14,578 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> > > test,,1256755871065
> > > 2009-10-28 18:51:14,578 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> > MSG_REGION_OPEN:
> > > test,,1256755871065
> > > 2009-10-28 18:51:14,578 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > region test,,1256755871065/796855017 available; sequence id is 10013291
> > > 2009-10-28 18:51:14,578 INFO
> > org.apache.hadoop.hbase.regionserver.HRegion:
> > > Starting compaction on region test,,1256755871065
> > > 2009-10-28 18:51:18,388 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> > > response for sessionid:0x249c76021d0001 after 0ms
> > > 2009-10-28 18:51:19,341 ERROR
> > > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > org.apache.hadoop.hbase.NotServingRegionException: test,,1256754924503
> > >         at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2307)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1784)
> > >         at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> > >         at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >         at java.lang.reflect.Method.invoke(Method.java:597)
> > >         at
> > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> > >         at
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> > > 2009-10-28 18:51:19,341 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > Server
> > > handler 0 on 60021, call get([B@21fefd80, row=1053508149,
> maxVersions=1,
> > > timeRange=[0,9223372036854775807), families={(family=email_ip_activity,
> > > columns=ALL}) from x.x.x.x:54669: error:
> > > org.apache.hadoop.hbase.NotServingRegionException: test,,1256754924503
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Oct 28, 2009 at 2:56 PM, Jonathan Gray <jlist@streamy.com>
> > wrote:
> > >
> > >> These client error messages are not particular descriptive as to the
> > root
> > >> cause (they are fatal errors, or close to it).
> > >>
> > >> What is going on in your regionservers when these errors happen?
>  Check
> > >> the master and RS logs.
> > >>
> > >> Also, you definitely do not want 19 zookeeper nodes.  Reduce that to 3
> > or
> > >> 5 max.
> > >>
> > >> What is the hardware you are using for these nodes, and what settings
> do
> > >> you have for heap/GC?
> > >>
> > >> JG
> > >>
> > >>
> > >> Zhenyu Zhong wrote:
> > >>
> > >>> Stack,
> > >>>
> > >>> Thank you very much for your comments.
> > >>> I am running a cluster with 20 nodes. I set 19 as both regionserver
> and
> > >>> zookeeper quorums.
> > >>> The versions I am using are  Hadoop0.20.1 and HBase0.20.1.
> > >>> I started with an empty table and try to load 200 million records
> into
> > >>> that
> > >>> empty table.
> > >>> There is a key in each record. Logically, in my MR program, during
> the
> > >>> setup, I opened an HTable, in my mapper, I fetch the record from
> HTable
> > >>> via
> > >>> key in the record, then make some changes to the columns and update
> > that
> > >>> row
> > >>> back to HTable through TableOutputFormat by passing a put. There is
> no
> > >>> reduce tasks involved here.  (Though it is unnecessary to fetch row
> > from
> > >>> an
> > >>> empty table, I just intended to do that)
> > >>>
> > >>> Additionally, when I reduced the number of regionservers and number
> of
> > >>> zookeeper quorums.
> > >>> I had different errors:
> > >>> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> > >>> trying
> > >>> to locate root region at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:929)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:580)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:562)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:693)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:589)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:562)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:693)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:593)
> > >>> at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:556)
> > >>> at org.apache.hadoop.hbase.client.HTable.(HTable.java:127) at
> > >>> org.apache.hadoop.hbase.client.HTable.(HTable.java:105) at
> > >>>
> > >>>
> >
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat.getRecordWriter(TableOutputFormat.java:116)
> > >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:573)
at
> > >>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at
> > >>> org.apache.hadoop.mapred.Child.main(Child.java:170)
> > >>>
> > >>> Many thanks in advance.
> > >>> zhenyu
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Oct 28, 2009 at 12:39 PM, stack <stack@duboce.net> wrote:
> > >>>
> > >>>  Whats your cluster topology?  How many nodes involved?  When you see
> > the
> > >>>> below message, how many regions in your table?  How are you loading
> > your
> > >>>> table?
> > >>>> Thanks,
> > >>>> St.Ack
> > >>>>
> > >>>> On Wed, Oct 28, 2009 at 7:45 AM, Zhenyu Zhong <
> > zhongresearch@gmail.com
> > >>>>
> > >>>>> wrote:
> > >>>>> Nitay,
> > >>>>>
> > >>>>> I am very appreciated.
> > >>>>>
> > >>>>> As Ryan suggested, I increased the zookeeper session timeout
to
> > >>>>> 40seconds
> > >>>>> along with the GC options -XX:ParallelGCThreads=8
> > >>>>>
> > >>>>  -XX:+UseConcMarkSweepGC
> > >>>>
> > >>>>> in place. I set the Heapsize to 4GB.  I also set the
> vm.swappiness=0.
> > >>>>>
> > >>>>> However it still ran into problem. Please find the following
> errors.
> > >>>>>
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying
to
> > >>>>> contact region server x.x.x.x:60021 for region
> > >>>>> YYYY,117.99.7.153,1256396118155, row '1170491458', but failed
after
> > 10
> > >>>>> attempts.
> > >>>>> Exceptions:
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > >>>>> setting up proxy to /x.x.x.x:60021 after attempts=1
> > >>>>>
> > >>>>>       at
> > >>>>>
> > >>>>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001)
> > >>>>
> > >>>>>       at org.apache.hadoop.hbase.client.HTable.get(HTable.java:413)
> > >>>>>
> > >>>>>
> > >>>>> The input file is about 10GB around 200million rows of data.
> > >>>>> This load doesn't seem too large. However this kind of errors
keep
> > >>>>>
> > >>>> popping
> > >>>>
> > >>>>> up.
> > >>>>>
> > >>>>> Does Regionserver need to be deployed to dedicated machines?
> > >>>>> Does Zookeeper need to be deployed to dedicated machines as
well?
> > >>>>>
> > >>>>> Best,
> > >>>>> zhenyu
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Oct 28, 2009 at 1:37 AM, nitay <nitayj@gmail.com>
wrote:
> > >>>>>
> > >>>>>  Hi Zhenyu,
> > >>>>>>
> > >>>>>> Sorry for the delay. I started working on this a while
back,
> before
> > I
> > >>>>>>
> > >>>>> left
> > >>>>>
> > >>>>>> my job for another company. Since then I haven't had much
time to
> > work
> > >>>>>>
> > >>>>> on
> > >>>>
> > >>>>> HBase unfortunately :(. I'll try to dig up what I had and see
what
> > >>>>>>
> > >>>>> shape
> > >>>>
> > >>>>> it's in and update you.
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>> -n
> > >>>>>>
> > >>>>>>
> > >>>>>> On Oct 27, 2009, at 3:38 PM, Ryan Rawson wrote:
> > >>>>>>
> > >>>>>>  Sorry I must have mistyped, I meant to say "40 seconds".
 You can
> > >>>>>>
> > >>>>>>> still see multi-second pauses at times, so you need
to give
> > yourself
> > >>>>>>> a
> > >>>>>>> bigger buffer.
> > >>>>>>>
> > >>>>>>> The parallel threads argument should not be necessary,
but you do
> > >>>>>>> need
> > >>>>>>> the UseConcMarkSweepGC flag as well.
> > >>>>>>>
> > >>>>>>> Let us know how it goes!
> > >>>>>>> -ryan
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Oct 27, 2009 at 3:19 PM, Zhenyu Zhong <
> > >>>>>>>
> > >>>>>> zhongresearch@gmail.com>
> > >>>>
> > >>>>>  wrote:
> > >>>>>>>
> > >>>>>>>  Ryan,
> > >>>>>>>> I am very appreciated for your feedbacks.
> > >>>>>>>> I have set the zookeeper.session.timeout to seconds
which is way
> > >>>>>>>>
> > >>>>>>> higher
> > >>>>
> > >>>>>  than
> > >>>>>>>> 40ms.
> > >>>>>>>> In the same time, the -Xms is set to 4GB, which
should be
> > >>>>>>>> sufficient.
> > >>>>>>>> I also tried GC options like
> > >>>>>>>>
> > >>>>>>>>  -XX:ParallelGCThreads=8
> > >>>>>>>> -XX:+UseConcMarkSweepGC
> > >>>>>>>>
> > >>>>>>>> I even set the vm.swappiness=0
> > >>>>>>>>
> > >>>>>>>> However, I still came across the problem that a
RegionServer
> > >>>>>>>> shutdown
> > >>>>>>>> itself.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> zhong
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Tue, Oct 27, 2009 at 6:05 PM, Ryan Rawson <
> ryanobjc@gmail.com>
> > >>>>>>>>
> > >>>>>>> wrote:
> > >>>>>
> > >>>>>>   Set the ZK timeout to something like 40ms, and give the
GC
> enough
> > >>>>>>>>
> > >>>>>>> Xmx
> > >>>>
> > >>>>>  so you never risk entering the much dreaded
> concurrent-mode-failure
> > >>>>>>>>> whereby the entire heap must be GCed.
> > >>>>>>>>>
> > >>>>>>>>> Consider testing Java 7 and the G1 GC.
> > >>>>>>>>>
> > >>>>>>>>> We could get a JNI thread to do this, but no
one has done so
> yet.
> > I
> > >>>>>>>>>
> > >>>>>>>> am
> > >>>>
> > >>>>>  personally hoping for G1 and in the meantime overprovision
our Xmx
> > >>>>>>>>>
> > >>>>>>>> to
> > >>>>
> > >>>>>  avoid the concurrent mode failures.
> > >>>>>>>>>
> > >>>>>>>>> -ryan
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Oct 27, 2009 at 2:59 PM, Zhenyu Zhong
<
> > >>>>>>>>>
> > >>>>>>>> zhongresearch@gmail.com>
> > >>>>>
> > >>>>>>  wrote:
> > >>>>>>>>>
> > >>>>>>>>>  Ryan,
> > >>>>>>>>>>
> > >>>>>>>>>> Thank you very much.
> > >>>>>>>>>> May I ask whether there are any ways to
get around this
> problem
> > to
> > >>>>>>>>>>
> > >>>>>>>>> make
> > >>>>>
> > >>>>>>  HBase more stable?
> > >>>>>>>>>>
> > >>>>>>>>>> best,
> > >>>>>>>>>> zhong
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, Oct 27, 2009 at 4:06 PM, Ryan Rawson
<
> > ryanobjc@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>  There isnt any working code yet. Just
an idea, and a
> prototype.
> > >>>>>>>>>>
> > >>>>>>>>>>> There is some sense that if we can
get the G1 GC that we
> could
> > >>>>>>>>>>> get
> > >>>>>>>>>>>
> > >>>>>>>>>> rid
> > >>>>>
> > >>>>>>  of all long pauses, and avoid the need for this.
> > >>>>>>>>>>>
> > >>>>>>>>>>> -ryan
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Mon, Oct 26, 2009 at 2:30 PM, Zhenyu
Zhong <
> > >>>>>>>>>>> zhongresearch@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>  Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I am very interesting to the solution
that Joey proposed and
> > >>>>>>>>>>>>
> > >>>>>>>>>>> would
> > >>>>
> > >>>>>   like
> > >>>>>>>>>>>
> > >>>>>>>>>> to
> > >>>>>>>>>>
> > >>>>>>>>>>> have a try.
> > >>>>>>>>>>>> Does anyone have any ideas on how
to deploy this zk_wrapper
> in
> > >>>>>>>>>>>>
> > >>>>>>>>>>> JNI
> > >>>>
> > >>>>>   integration?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I would be very appreciated.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> thanks
> > >>>>>>>>>>>> zhong
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message