hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <Ankur.G...@corp.aol.com>
Subject RE: HBase performance tuning
Date Wed, 26 Mar 2008 05:24:08 GMT
I use the 'trunk' to get and build the code locally.
Does the latest code on it have all the fixes ?

Thanks
-Ankur

-----Original Message-----
From: stack [mailto:stack@duboce.net] 
Sent: Tuesday, March 25, 2008 7:25 PM
To: hbase-user@hadoop.apache.org
Subject: Re: HBase performance tuning

Are you using hbase 0.1 branch?  Try the release candidate.  Has a fix
for 'table does not exist' issue among other fixes.
St.Ack

Goel, Ankur wrote:
> Hi Again,
>           A couple of issues that I faced are as follows
>
> 1. If I terminate the local client (Java program used for insert. 
> Please see the post before this.)
>    HBase goes into an inconsistent state. Though the tables are still 
> shown to be available, an attempt
>    to drop the table gives the message "Table does not exist". If I 
> try to truncate the table, I get an
>    IOException on the client from 'TableOperation' class. 
>    It looks like the abrupt closure of socket connection from the 
> client side corrupted the META information
>    and also the data files. Is this a known issue ? If not I can 
> reproduce it and while a JIRA issue with
>    stack trace and description.
>
> 2. Trying to connect to region servers from a remote location and 
> inserting data from a file local to remote
>    client gave an insert speed of 3 rows/sec = 180 rows/min !!!. This 
> is terribly slow when the available
>    bandwidth is 2 Mbps. Any ideas on what could be the bottle neck 
> here ?
>
> Thanks
> -Ankur
>
>
> -----Original Message-----
> From: ANKUR GOEL [mailto:ankur.goel@corp.aol.com]
> Sent: Tuesday, March 25, 2008 7:05 PM
> To: hbase-user@hadoop.apache.org
> Subject: HBase performance tuning
>
> Hi Folks,
>              I have a table with the following column families in the 
> schema
>         {"referer_id:", "100"},  (Integer here is max length)
>         {"url:","1500"},
>         {"site:","500"},
>         {"status:","100"}
>
> The common attributes for all the above column families are [max
> versions: 1,  compression: NONE, in memory: false, block cache
enabled:
> true, max length: 100, bloom filter: none]
>
> [HBase Configuration]:
>    - HDFS runs on 10 machine nodes with 8 GB RAM each and 4 CPU cores.
>    - HMaster runs on a different machine than NameNode.
>    - There are 9 regionserves configured
>    - Total DFS available  = 150 GB.
>    - LAN speed in 100 Mbps
>
> I am trying to insert approx 4.8 million rows and the speed that I get

> is around 1500 row inserts per sec (100,000 row inserts per min.).
>
> It takes around 50 min to insert all the seeds. The Java program that 
> does the inserts uses buffered I/O to read the the data from a local 
> file and runs on the same machine as the HMaster.To give you an idea 
> of Java code that does the insert here is a snapshot of the loop.
>
>  while ((url = seedReader.readLine()) != null) {
>       try {
>         BatchUpdate update = new BatchUpdate(new 
> Text(md5(normalizedUrl)));
>         update.put(new Text("url:"), getBytes(url));
>         update.put(new Text("site:"), getBytes(new
URL(url).getHost()));
>         update.put(new Text("status:"), getBytes(status));
>         seedlist.commit(update); // seedlist is the HTable
>        }
> ....
> ....
>
> Is there a way to tune HBase to achieve better I/O speeds ?
> Ideally I would like to reduce the total insert time to less than 15 
> min i.e achieve an insert speed of around 4500 rows/sec or more.
>
> Thanks
> -Ankur
>
>
>   


Mime
View raw message