hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Region Servers going down frequently
Date Wed, 08 Apr 2009 07:21:04 GMT
I'm not sure if I can answer that correctly or not. But my guess is no it
wont hamper the performance.


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Apr 8, 2009 at 12:13 AM, Rakhi Khatwani <rakhi.khatwani@gmail.com>wrote:

> Hi Amandeep,
>
> But in That case, if I let hbase split it automatically, my table with
> 17000
> rows will have only one region. thus my analysis will have only one map.
> won't the analysis process be slower in that case??
>
> Thanks,
> Raakhi
>
> On Wed, Apr 8, 2009 at 12:35 PM, Amandeep Khurana <amansk@gmail.com>
> wrote:
>
> > You cant compensate the RAM with processing power. Hbase keeps a lot of
> > open
> > file handles in hdfs which needs memory so you need the RAM.
> >
> > Secondly, 17000 rows isnt much to cause a region split. I dont know exact
> > numbers but I had a table with 6 million rows and only 3 regions. So,
> thats
> > not a big deal.
> >
> > Thirdly, try with upping the xceivers and ulimit and see if it works with
> > the existing RAM... Thats the only way out.
> >
> >
> > Amandeep Khurana
> > Computer Science Graduate Student
> > University of California, Santa Cruz
> >
> >
> > On Wed, Apr 8, 2009 at 12:02 AM, Rakhi Khatwani <
> rakhi.khatwani@gmail.com
> > >wrote:
> >
> > > Hi Amandeep,
> > >
> > > Following is my ec2 cluster configuration:
> > > High-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2
> virtual
> > > cores with 2.5 EC2 Compute Units each), 350 GB of instance storage,
> > 32-bit
> > > platform
> > >
> > > so I don't think I have much option when it comes to the GB part.
> > iHowever,
> > > is there any way i can make use of 5ec2 compute units to increase my
> > > performance?
> > >
> > > Regarding the table splits, I dont see hbase doing the table spilts
> > > automatically.
> > > After loading about 17000 rows in table1, I can still see it as one
> > region
> > > (after checking it on web UI). thats why i had to manually split it. or
> > is
> > > there any configuration/settings I have to do to ensure that the tables
> > are
> > > split automatically?
> > >
> > > I will increase the dataXceivers and ulimit to 32k
> > >
> > > Thanks a ton
> > > Rakhi.
> > >
> > >
> > >
> > > >
> > > > > Hi Amandeep,
> > > > >                  I have 1GB Memory on each node on ec2 cluster(C1
> > > Medium)
> > > > .
> > > > > i am using hadoop-0.19.0 and hbase-0.19.0
> > > > > well we were starting with 10,000 rows, but later it will go up to
> > > > 100,000
> > > > > rows.
> > > >
> > > >
> > > > 1GB is too low. You need around 4GB to get a stable system.
> > > >
> > > > >
> > > > >
> > > > > my map task basically reads an hbase table 'Table1', performs
> > analysis
> > > on
> > > > > each row, and dumps the analysis results into another hbase table
> > > > 'Table2'.
> > > > > each analysis task takes about 3-4 minutes when tested on local
> > machine
> > > > > (the
> > > > > algorithm part.... w/o the map reduce).
> > > > >
> > > > > i have divided 'Table1' to 30 regions b4 sending it to the map. and
> > set
> > > > the
> > > > > maximum number of map tasks to 20.
> > > >
> > > > Let hbase do the division into regions. Leave the table as it is in
> > > default
> > > > state.
> > > >
> > > > >
> > > > > i have set DataXceivers to 1024 and uLimit to 1024
> > > >
> > > > yes.. increase these..
> > > > 2048 dataxceivers and 32k ulimit.
> > > >
> > > > >
> > > > > i am able to process about 300 rows in an hour which i feel quite
> > > slow...
> > > > > how do i increase the performance.
> > > >
> > > > the reaons are mentioned above.
> > > >
> > > > >
> > > > >
> > > > > meanwhile i will try settin the dataXceivers to 2048 and increasing
> > the
> > > > > file
> > > > > limit as you mentioned.
> > > > >
> > > > > Thanks,
> > > > > Rakhi
> > > > >
> > > > > On Wed, Apr 8, 2009 at 11:40 AM, Amandeep Khurana <
> amansk@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > 20 nodes is good enough to begins with. How much memory do you
> have
> > > on
> > > > > each
> > > > > > node? IMO, you should keep 1GB per daemon and 1GB for the MR
job
> > like
> > > > > > Andrew
> > > > > > suggested.
> > > > > > You dont necessarily have to separate the datanodes and
> > tasktrackers
> > > as
> > > > > > long
> > > > > > as you have enough resources.
> > > > > > 10000 rows isnt big at all from hbase standpoint. What kind
of
> > > > > computation
> > > > > > are you doing before dumping data into hbase? And what versions
> of
> > > > Hadoop
> > > > > > and Hbase are you running?
> > > > > >
> > > > > > There's another thing you should do. Increase the DataXceivers
> > limit
> > > to
> > > > > > 2048
> > > > > > (thats what I use).
> > > > > >
> > > > > > If you have root privelege over the cluster, then increase the
> file
> > > > limit
> > > > > > to
> > > > > > 32k (see hbase faq for details).
> > > > > >
> > > > > > Try this out and see how it goes.
> > > > > >
> > > > > >
> > > > > > Amandeep Khurana
> > > > > > Computer Science Graduate Student
> > > > > > University of California, Santa Cruz
> > > > > >
> > > > > >
> > > > > > On Tue, Apr 7, 2009 at 2:45 AM, Rakhi Khatwani <
> > > > rakhi.khatwani@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >      I have a 20 node cluster on ec2(small instance)....
i have
> a
> > > set
> > > > > of
> > > > > > > tables which store huge amount of data (tried wid 10,000
> rows...
> > > more
> > > > > to
> > > > > > be
> > > > > > > added).... but during my map reduce jobs, some of the region
> > > servers
> > > > > shut
> > > > > > > down thereby causing data loss, stop in my program execution
> and
> > > > infact
> > > > > > one
> > > > > > > of my tables got damaged. when ever i scan the table, i
get the
> > > could
> > > > > not
> > > > > > > obtain block error.
> > > > > > >
> > > > > > > 1. i want to make the cluster more robust. since it contains
a
> > lot
> > > of
> > > > > > data.
> > > > > > > and its really important that they remain stable.
> > > > > > > 2. if one of my tables gets damaged (even after restarting
dfs
> n
> > > > > hbase),
> > > > > > > how
> > > > > > > do i go about recovering it?
> > > > > > >
> > > > > > > my ec2 cluster mostly has the default configuration.
> > > > > > > with hadoop-site n hbase-site have some entries pertaining
to
> > > > > map-reduce
> > > > > > > (for example. num of map tasks, mapred.task.timeout etc).
> > > > > > >
> > > > > > > Your help will be greatly appreciated.
> > > > > > > Thanks,
> > > > > > > Raakhi Khatwani
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message