hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lv <lvzheng19800...@gmail.com>
Subject Re: is there any problem with our environment?
Date Tue, 24 Nov 2009 06:29:42 GMT
Hello Stack,
Can you see my mail?It seems you don't see it.

2009/11/21 stack <stack@duboce.net>

> On Fri, Nov 20, 2009 at 12:28 AM, Zheng Lv <lvzheng19800619@gmail.com
> >wrote:
>
> > Hello Stack,
> > Remember the "no route to host" exceptions last time? Now there isn't any
> > more, and the test program can be running for several days.
>
>
> How did you fix it?
>
>
>
> > Thank you.
> > Recently we started running our crawling program, which crawls webpages
> and
> > then insert them to hbase.
> > But we got so many "org.apache.hadoop.hbase.NotServingRegionException"
> like
> > that:
> >
> > 2009-11-20 12:36:41,898 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > org.apache.hadoop.hbase.NotServingRegionException: webpage,
> >
> >
> http://bbs.city.tianya.cn/tianyacity/Content/178/1/536629.shtml,1258691377544
> >
>
> So figure out whats happening to that region by grepping its name in the
> master log.  Why is it offline so long?  Are machines loaded?  Swapping?
>
> Are the crawlers running on same machines as hbase?
>
> What crawler are you using?
>
> Andrew Purtell has written up some notes on getting a nice balance between
> crawl process and hbase such that all runs smoothly in private
> correspondence.  Let me ask him if its ok to forward the list.
>
>
> ....
>
> 2009-11-20 12:36:25,259 INFO org.apache.hadoop.hbase.master.ServerManager:
> > Processing MSG_REPORT_SPLIT:
> > webpage,http:\x2F\x2Fbbs.city.tianya.cn <http://x2fbbs.city.tianya.cn/>
> > \x2Ftianyacity\x2FContent\x2F178\x2F1\x2F536629.shtml,1258691377544:
> > Daughters; webpage,http:\x2F\x2Fbbs.city.tianya.cn<http://x2fbbs.city.tianya.cn/>
> > \x2Ftianyacity\x2FContent\x2F178\x2F1\x2F536629.shtml,1258691779496,
> > webpage,http:\x2F\x2Fbbs.city.tianya.cn <http://x2fbbs.city.tianya.cn/>
> > \x2Ftianyacity\x2FContent\x2F329\x2F1\x2F164370.shtml,1258691779496
> > from ubuntu12,60020,1258687326554;
> >
> > Yeah, its split.  Thats normal.  Whats not normal is the client not
> finding
> the daughter split in its new location.  Did the daughters get deployed
> promptly?
>
>
>
> > And a few hours later, some rs shutdown.
> >
> > I read the mail
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200907.mbox/%3C9b27a8a60907272122y1bfa6254n95948942d5ca7f88@mail.gmail.com%3E
> > ,
> > which was sent by my partner Angus. In the mail you told us it was a case
> > of
> > "HBASE-1671", Fix Version of which is 0.20.0, but the hbase version we
> are
> > using is just 0.20.0.
> >
>
> Can you update to hbase 0.20.2?  It has a bunch of fixes that could be
> related to the above.
> Yours,
> St.Ack
>
>
>
> > Any idea?
> > Best Regards,
> > LvZheng
> >
> >
> >
> >
> >
> > 2009/10/13 stack <stack@duboce.net>
> >
> > > Thanks for posting.  Its much easier reading the logs from there.
> > >
> > > Looking in nohup.out I see it can't find region 'webpage,http:\x2F\
> > > x2Fnews.163.com <http://x2fnews.163.com/> <http://x2fnews.163.com/>
>  > >
> >
> \x2F09\x2F080\x2F0\x2F5FOO155J0001124J.html1255072992000_751685,1255316061169'.
> > > It never finds it.   It looks like it was assigned successfully to
> > > 192.168.33.5 going by the master log.  Once you've figured out the
> > > hardware/networking issues, lets work at getting that region back on
> > line.
> > >
> > > The master timed out its session against zk because of 'no route to
> > host'.
> > >
> > > St.Ack
> > >
> > > On Mon, Oct 12, 2009 at 12:23 AM, Zheng Lv <lvzheng19800619@gmail.com
> > > >wrote:
> > >
> > > > Hello Stack,
> > > >    I have enabled DEBUG and restarted the test program. This time the
> > > > master shut down, and I have put the logs on skydrive.
> > > >
> > > >
> > >
> >
> http://cid-a331bb289a14fbef.skydrive.live.com/browse.aspx/.Public?uc=2&isFromRichUpload=1
> > > > .
> > > >    "nohup.out" is our test program log,
> "hbase-cyd-master-ubuntu6.log"
> > is
> > > > master log.
> > > >
> > > >    On the other hand, today we found that when we run "dmesg", there
> > were
> > > > many logs like "[3641697.122769] r8169: eth0: link down". And I think
> > > this
> > > > might be the reason of so many "no route to host" and "Time Out". Now
> > our
> > > > system manager is checking, if we have a result we will let you
> know.:)
> > > >    Thanks,
> > > >    LvZheng.
> > > >
> > > > 2009/10/11 stack <stack@duboce.net>
> > > >
> > > > > On Fri, Oct 9, 2009 at 3:18 AM, Zheng Lv <
> lvzheng19800619@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > ...
> > > > > > so,
> > > > > >    > please remove the delay so hbase fails faster so it
doesn't
> > take
> > > > so
> > > > > > long to
> > > > > >    > figure the issue.
> > > > > >    > Are you inserting every 10ms because hbase is falling
over
> on
> > > you?
> > > > >  If
> > > > > >    Yes I inserted every 10ms because I'm afraid hbase would
fall
> > > over.
> > > > > Now
> > > > > > I have removed the delay.
> > > > > >
> > > > > >    After doing these, We have run the test program again, and
one
> > > > region
> > > > > > server shut down after about 2 hours, another one 3.
> > > > > >    I will post the logs on these two servers in following reply
> > > mails.
> > > > > >
> > > > > >
> > > > > Thanks for doing the above.
> > > > >
> > > > > For the future, debugging, please enable DEBUG and put your logs
> > > > somewhere
> > > > > where I can pull them or put them up in pastebin.  Logs in email
> > > messages
> > > > > are hard to follow.  Thanks.
> > > > >
> > > > >
> > > > > >    > Ok.  So this is hbase 0.20.0?  Tell us about your hardware.
> > >  What
> > > > > kind
> > > > > > is
> > > > > >    > it?  CPU/RAM/Disks.
> > > > > >     Yes we are using  hbase 0.20.0. And the following is our
> > > hardware:
> > > > > >
> > > > > >    CPU:amd x3 710
> > > > > >    RAM:8g ddr2 800
> > > > > >    Disk:270g(raid0)
> > > > > >
> > > > > >
> > > > > Thats an interesting chip -- 3 cores!  The above should be fine as
> > long
> > > > as
> > > > > you coral your mapreduce jobs running on same cluster.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >    We have 7 servers with above hardware, one for master, three
> for
> > > > > > namenodes / regionservers, and the other 3 for zks.
> > > > > >    By the way, what kind of hardware and environment do you
> suggest
> > > we
> > > > > > have?
> > > > > >
> > > > >
> > > > >
> > > > > This configuration seems fine to start with.  Later we might
> > experiment
> > > > > running zk on same machines as regionservers and then up number of
> > > > > regionservers to 6 and up the quorum members to 5.
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > > >
> > > > > >    Thank you, very much.
> > > > > >    LvZheng.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message