hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Goel, Ankur" <Ankur.G...@corp.aol.com>
Subject RE: HBase performance tuning
Date Tue, 25 Mar 2008 13:51:47 GMT
Hi Again,
          A couple of issues that I faced are as follows

1. If I terminate the local client (Java program used for insert. Please
see the post before this.)
   HBase goes into an inconsistent state. Though the tables are still
shown to be available, an attempt
   to drop the table gives the message "Table does not exist". If I try
to truncate the table, I get an 
   IOException on the client from 'TableOperation' class. 
   It looks like the abrupt closure of socket connection from the client
side corrupted the META information 
   and also the data files. Is this a known issue ? If not I can
reproduce it and while a JIRA issue with
   stack trace and description.

2. Trying to connect to region servers from a remote location and
inserting data from a file local to remote 
   client gave an insert speed of 3 rows/sec = 180 rows/min !!!. This is
terribly slow when the available 
   bandwidth is 2 Mbps. Any ideas on what could be the bottle neck here
?

Thanks
-Ankur


-----Original Message-----
From: ANKUR GOEL [mailto:ankur.goel@corp.aol.com] 
Sent: Tuesday, March 25, 2008 7:05 PM
To: hbase-user@hadoop.apache.org
Subject: HBase performance tuning

Hi Folks,
             I have a table with the following column families in the
schema
        {"referer_id:", "100"},  (Integer here is max length)
        {"url:","1500"},
        {"site:","500"},
        {"status:","100"}

The common attributes for all the above column families are [max
versions: 1,  compression: NONE, in memory: false, block cache enabled:
true, max length: 100, bloom filter: none]

[HBase Configuration]:
   - HDFS runs on 10 machine nodes with 8 GB RAM each and 4 CPU cores.
   - HMaster runs on a different machine than NameNode.
   - There are 9 regionserves configured
   - Total DFS available  = 150 GB.
   - LAN speed in 100 Mbps

I am trying to insert approx 4.8 million rows and the speed that I get
is around 1500 row inserts per sec (100,000 row inserts per min.).

It takes around 50 min to insert all the seeds. The Java program that
does the inserts uses buffered I/O to read the the data from a local
file and runs on the same machine as the HMaster.To give you an idea of
Java code that does the insert here is a snapshot of the loop.

 while ((url = seedReader.readLine()) != null) {
      try {
        BatchUpdate update = new BatchUpdate(new
Text(md5(normalizedUrl)));
        update.put(new Text("url:"), getBytes(url));
        update.put(new Text("site:"), getBytes(new URL(url).getHost()));
        update.put(new Text("status:"), getBytes(status));
        seedlist.commit(update); // seedlist is the HTable
       }
....
....

Is there a way to tune HBase to achieve better I/O speeds ?
Ideally I would like to reduce the total insert time to less than 15 min
i.e achieve an insert speed of around 4500 rows/sec or more.

Thanks
-Ankur



Mime
View raw message