hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhenyu Zhong <zhongresea...@gmail.com>
Subject Re: regarding to HBase 1316 ZooKeeper: use native threads to avoid GC stalls (JNI integration)
Date Wed, 28 Oct 2009 14:45:30 GMT
Nitay,

I am very appreciated.

As Ryan suggested, I increased the zookeeper session timeout to 40seconds
along with the GC options -XX:ParallelGCThreads=8  -XX:+UseConcMarkSweepGC
in place. I set the Heapsize to 4GB.  I also set the vm.swappiness=0.

However it still ran into problem. Please find the following errors.

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
contact region server x.x.x.x:60021 for region
YYYY,117.99.7.153,1256396118155, row '1170491458', but failed after 10
attempts.
Exceptions:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
setting up proxy to /x.x.x.x:60021 after attempts=1

	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001)
	at org.apache.hadoop.hbase.client.HTable.get(HTable.java:413)


The input file is about 10GB around 200million rows of data.
This load doesn't seem too large. However this kind of errors keep popping
up.

Does Regionserver need to be deployed to dedicated machines?
Does Zookeeper need to be deployed to dedicated machines as well?

Best,
zhenyu



On Wed, Oct 28, 2009 at 1:37 AM, nitay <nitayj@gmail.com> wrote:

> Hi Zhenyu,
>
> Sorry for the delay. I started working on this a while back, before I left
> my job for another company. Since then I haven't had much time to work on
> HBase unfortunately :(. I'll try to dig up what I had and see what shape
> it's in and update you.
>
> Cheers,
> -n
>
>
> On Oct 27, 2009, at 3:38 PM, Ryan Rawson wrote:
>
>  Sorry I must have mistyped, I meant to say "40 seconds".  You can
>> still see multi-second pauses at times, so you need to give yourself a
>> bigger buffer.
>>
>> The parallel threads argument should not be necessary, but you do need
>> the UseConcMarkSweepGC flag as well.
>>
>> Let us know how it goes!
>> -ryan
>>
>>
>> On Tue, Oct 27, 2009 at 3:19 PM, Zhenyu Zhong <zhongresearch@gmail.com>
>> wrote:
>>
>>> Ryan,
>>> I am very appreciated for your feedbacks.
>>> I have set the zookeeper.session.timeout to seconds which is way higher
>>> than
>>> 40ms.
>>> In the same time, the -Xms is set to 4GB, which should be sufficient.
>>> I also tried GC options like
>>>
>>>  -XX:ParallelGCThreads=8
>>> -XX:+UseConcMarkSweepGC
>>>
>>> I even set the vm.swappiness=0
>>>
>>> However, I still came across the problem that a RegionServer shutdown
>>> itself.
>>>
>>> Best,
>>> zhong
>>>
>>>
>>> On Tue, Oct 27, 2009 at 6:05 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>>
>>>  Set the ZK timeout to something like 40ms, and give the GC enough Xmx
>>>> so you never risk entering the much dreaded concurrent-mode-failure
>>>> whereby the entire heap must be GCed.
>>>>
>>>> Consider testing Java 7 and the G1 GC.
>>>>
>>>> We could get a JNI thread to do this, but no one has done so yet. I am
>>>> personally hoping for G1 and in the meantime overprovision our Xmx to
>>>> avoid the concurrent mode failures.
>>>>
>>>> -ryan
>>>>
>>>> On Tue, Oct 27, 2009 at 2:59 PM, Zhenyu Zhong <zhongresearch@gmail.com>
>>>> wrote:
>>>>
>>>>> Ryan,
>>>>>
>>>>> Thank you very much.
>>>>> May I ask whether there are any ways to get around this problem to make
>>>>> HBase more stable?
>>>>>
>>>>> best,
>>>>> zhong
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 27, 2009 at 4:06 PM, Ryan Rawson <ryanobjc@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  There isnt any working code yet. Just an idea, and a prototype.
>>>>>>
>>>>>> There is some sense that if we can get the G1 GC that we could get
rid
>>>>>> of all long pauses, and avoid the need for this.
>>>>>>
>>>>>> -ryan
>>>>>>
>>>>>> On Mon, Oct 26, 2009 at 2:30 PM, Zhenyu Zhong <
>>>>>> zhongresearch@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am very interesting to the solution that Joey proposed and
would
>>>>>>>
>>>>>> like
>>>>
>>>>> to
>>>>>>
>>>>>>> have a try.
>>>>>>> Does anyone have any ideas on how to deploy this zk_wrapper in
JNI
>>>>>>> integration?
>>>>>>>
>>>>>>> I would be very appreciated.
>>>>>>>
>>>>>>> thanks
>>>>>>> zhong
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message