hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhtar Muhammad Din <akhtar.m...@gmail.com>
Subject Hbase Performance Issue
Date Sat, 04 Jan 2014 20:17:53 GMT
I have been running a map reduce job that joins 2 datasets of 1.3 and 4 GB
in size. Joining is done at reduce side. Output is written to either Hbase
or HDFS depending upon configuration. The problem I am having is that Hbase
takes about 60-80 minutes to write the processed data, on the other hand
HDFS takes only 3-5 mins to write the same data. I really want to improve
the Hbase speed and bring it down to 1-2 min.

I am using amazon EC2 instances, launched a cluster of size 3 and later 10,
have tried both c3.4xlarge and c3.8xlarge instances.

I can see significant increase in performance while writing to HDFS as i
use cluster with more nodes, having high specifications, but in the case of
Hbase there was no significant change in performance.

I have been going through different posts, articles and have read Hbase
book to solve the Hbase performance issue but have not been able to succeed
so far.
Here are the few things i have tried out so far:

*Client Side*
- Turned off writing to WAL
- Experimented with write buffer size
- Turned off auto flush on table
- Used cache, experimented with different sizes

*Hbase Server Side*
- Increased region servers heap size to 8 GB
- Experimented with handlers count
- Increased Memstore flush size to 512 MB
- Experimented with hbase.hregion.max.filesize, tried different sizes

There are many other parameters i have tried out following the suggestions
from  different sources, but nothing worked so far.

Your help will be really appreciated.

Akhtar Muhammad Din

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message