hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: HBase performance tuning
Date Wed, 26 Mar 2008 15:07:58 GMT
The only time the master is contacted by clients is to determine
the region server that is serving the root region. After that,
the client scans the meta region(s) to find the region servers
that are serving specific regions of interest. Each client
only opens one connection to each region server, but only if
that region server is serving a region of interest to the client.

So the number of connections open to each region server should
be at most 5000 / number of region servers


---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: Andy Li [mailto:annndy.lee@gmail.com]
> Sent: Wednesday, March 26, 2008 12:00 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: HBase performance tuning
>
> I have a sample to run MR and for each Map or Reducer, it
> will talk to HBase via HTable class.
>
> But before I put that online in the Wiki page, I need to
> confirm some thing that may increase performance issue when
> the cluster grows.  Basically, the problem is that if I have
> 5000 Maps running and each of them calls HTable or create a
> HTable instance that applies BatchUpdate, will that create
> 5000 connections to HBase master?  I have only done it in a
> smaller scale 100 Mappers and I don't see any problem, but it
> will require profiling and some instrument on the system and
> code to figure out.  It will be better to fork a new topic on
> this one.
>
> -Andy
>
> On Tue, Mar 25, 2008 at 10:26 PM, Goel, Ankur
> <Ankur.Goel@corp.aol.com>
> wrote:
>
> > A sample would be definitely good. Even better if we could
> Put it on
> > wiki for everyone else. If you don't have enough Spare
> cycles then do
> > let me know and I shall write the sample and put it back on wiki.
> >
> > Thanks
> > -Ankur
> >
> >
> > -----Original Message-----
> > From: stack [mailto:stack@duboce.net]
> > Sent: Tuesday, March 25, 2008 7:24 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: HBase performance tuning
> >
> > Your insert is single-threaded?  At a minimum your program
> should be
> > multithreaded.  Randomize the keys on your data so that the inserts
> > are spread across your 9 regionservers.  Better if you
> spend a bit of
> > time and write a mapreduce job to do the insert (If you
> want a sample,
> > write the list again and I'll put something together).
> > St.Ack
> >
> > ANKUR GOEL wrote:
> > > Hi Folks,
> > >             I have a table with the following column
> families in the
> > > schema
> > >        {"referer_id:", "100"},  (Integer here is max length)
> > >        {"url:","1500"},
> > >        {"site:","500"},
> > >        {"status:","100"}
> > >
> > > The common attributes for all the above column families are [max
> > > versions: 1,  compression: NONE, in memory: false, block cache
> > > enabled: true, max length: 100, bloom filter: none]
> > >
> > > [HBase Configuration]:
> > >   - HDFS runs on 10 machine nodes with 8 GB RAM each and
> 4 CPU cores.
> > >   - HMaster runs on a different machine than NameNode.
> > >   - There are 9 regionserves configured
> > >   - Total DFS available  = 150 GB.
> > >   - LAN speed in 100 Mbps
> > >
> > > I am trying to insert approx 4.8 million rows and the
> speed that I
> > > get
> >
> > > is around 1500 row inserts per sec (100,000 row inserts per min.).
> > >
> > > It takes around 50 min to insert all the seeds. The Java program
> > > that does the inserts uses buffered I/O to read the the
> data from a
> > > local file and runs on the same machine as the HMaster.To
> give you
> > > an idea of Java code that does the insert here is a
> snapshot of the loop.
> > >
> > > while ((url = seedReader.readLine()) != null) {
> > >      try {
> > >        BatchUpdate update = new BatchUpdate(new
> > > Text(md5(normalizedUrl)));
> > >        update.put(new Text("url:"), getBytes(url));
> > >        update.put(new Text("site:"), getBytes(new
> > URL(url).getHost()));
> > >        update.put(new Text("status:"), getBytes(status));
> > >        seedlist.commit(update); // seedlist is the HTable
> > >       }
> > > ....
> > > ....
> > >
> > > Is there a way to tune HBase to achieve better I/O speeds ?
> > > Ideally I would like to reduce the total insert time to
> less than 15
> > > min i.e achieve an insert speed of around 4500 rows/sec or more.
> > >
> > > Thanks
> > > -Ankur
> > >
> > >
> >
> >
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.519 / Virus Database: 269.22.0/1344 - Release
> Date: 3/26/2008 8:52 AM
>
>

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.22.0/1344 - Release Date: 3/26/2008 8:52 AM


Mime
View raw message