hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Re: HBase Failing on Large Loads
Date Fri, 12 Jun 2009 17:19:40 GMT
I don't think we're using ZK, I'm on HBase-0.19.4...am I wrong? :)
I've already got the GC configured to do what you suggested... I'm not
getting very long pauses from the log file. I really think the problem
is resource starvation because I only have 2 total cores on each of
those boxes, and it's running HBase on top of Hadoop DataNodes and
TaskTrackers. Am I right in this thinking?

On Thu, Jun 11, 2009 at 10:29 PM, Ryan Rawson<ryanobjc@gmail.com> wrote:
> Since you are on a 2-4 cpu system, you need to use:
>
> "-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>
> What do your gc verbose log say?  are you getting huge pauses?
>
> you can up the ZK, try doing this in zoo.conf server and client:
>
> tickTime=20000
> initLimit=5
> syncLimit=2
>
> and in hbase-site.xml:
> <property>
> <name>zookeeper.session.timeout</name>
> <value>60000</value>
> </property>
>
> This will give you a much higher zookeeper time out.
>
> Let us know!
>
>
>
> On Thu, Jun 11, 2009 at 10:25 PM, Bradford Stephens <
> bradfordstephens@gmail.com> wrote:
>
>> Thanks for helping me, o people of awesomeness.
>>
>> VM settings are 1000 for HBase, and I used the GC laid out in the
>> Wiki. Also, " -server " ... basically, I did everything here :
>> http://wiki.apache.org/hadoop/PerformanceTuning , and on
>>
>> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
>>
>> On Thu, Jun 11, 2009 at 8:02 PM, Ryan Rawson<ryanobjc@gmail.com> wrote:
>> > What are you vm/gc settings?  Let's tune that!
>> >
>> > On Jun 11, 2009 7:08 PM, "Bradford Stephens" <bradfordstephens@gmail.com
>> >
>> > wrote:
>> >
>> > OK, so I discovered the ulimit wasn't changed like I thought it was,
>> > had to fool with PAM in Ubuntu.
>> >
>> > Everything's running a little better, and I cut the data size by 66%.
>> >
>> > It took a while, but one of the machines with only 2 cores failed, and
>> > I caught it in the moment. Then 2 other machiens failed a few minutes
>> > later in a cascade. I'm thinking that HBase +Hadoop takes up so much
>> > proc time that the machine gradually stops responding to heartbeat....
>> > does that seem rational?
>> >
>> > Here's the first regionserver log: http://pastebin.com/m96e06fe
>> > I wish I could attach the log of one of the regionservers that failed
>> > a few minutes later, but it's 708MB! Here's some examples of the tail:
>> >
>> >  2009-06-11 19:00:18,418 WARN
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report
>> > to master for 906196 milliseconds - retrying
>> > 2009-06-11 19:00:18,419 WARN
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: error getting
>> > store file index size for 944890031/url:
>> > java.io.FileNotFoundException: File does not exist:
>> >
>> hdfs://dttest01:54310/hbase-0.19/joinedcontent/944890031/url/mapfiles/2512503149715575970/index
>> >
>> > The HBase Master log is surprisingly quiet...
>> >
>> > Overall, I think HBase just isn't happy on a machine with two
>> > single-core procs, and when they start dropping like flies, everything
>> > goes to hell. Do my log files support this?
>> >
>> > Cheers,
>> > Bradford
>> >
>> > On Wed, Jun 10, 2009 at 4:01 PM, Ryan Rawson<ryanobjc@gmail.com> wrote:
>> >
>> > Hey, > > Looks lke you h...
>> >
>>
>

Mime
View raw message