hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: newbie question on disk usage on node with different disk size
Date Fri, 27 Nov 2009 10:05:41 GMT

When you start hbase in a fresh installation it will use local fs in
/tmp.  The hadoop filesystem libraries we use allow the use of at
least 3 filesystems (local, hdfs, kfs).  Right now you are seeing the
single ZK process and the combined HBase master/regionserver process.

HBase needs the following things out of it's filesystem:
- global view - every single regionserver & master MUST see every file
from everyone at all times.  1 hour rsync won't cut it.
- high bandwidth, once you get 3+ servers doing high IO (compaction,
etc), you wont want to rely on a 1 disk NFS.

In theory you can use something like NFS and common mount dir on all
regionservers/masters. This won't scale of course. It should _in
theory_ work... You can specify the rootdir with something like
"file:///nfs_mount_path/hbase".  Normally we'd say

The hbase scripts don't boot up or control hadoop at all. You must
provide a working hadoop, then hbase can use it. It may seem a little
"annoying" to have a 2 step process, but the decoupled control makes
our control scripts more generic and suitable for all.

Good luck out there!

On Fri, Nov 27, 2009 at 1:55 AM, Tux Racer <tuxracer69@gmail.com> wrote:
> Thanks Ryan for your answer.
> yes I was mistaken, I also thought that the default install of hbase did a
> one node install of HDFS; and it seems that wrong:
> a ps auwx|grep java
> show only two java processes;
> org.apache.hadoop.hbase.zookeeper.HQuorumPeer
> and
> org.apache.hadoop.hbase.master.HMaster
> In the default hbase distribution we have in
> hbase-default.xml
> <name>hbase.rootdir</name>
> <value>file:///tmp/hbase-${user.name}/hbase</value>
> I thought that the dependancy of hbase on HDFS was much stronger. For the
> hbase configuration point of view if the hbase.rootdir parameter the only
> parameter that hooks hbase to HDFS?
> Or does zookeeper also binds hbase to HDFS?
> Is it true to say that hbase does play well with HDFS but that it does play
> well with any POSIX compliant filesystem too?
> For a small cluster, is that a good idea to *not* use HDFS as a storage for
> the hbase data?
> If I accept to loose one hour of hbase data, is it OK to make hbase.rootdir
> point to  local (ext3) file system on the node and then rsync each hour that
> directory to another node? I guess that rsync is not ideal due to the file
> structure used (will generate a lot of network traffic)
> Thanks in advance,
> TR
> Ryan Rawson wrote:
>> I think you might be mistaken a bit - HBase layers on top of, and uses
>> hadoop.  HBase uses HDFS for persistence, and thus the balancer config
>> and the other things you point out belong in the hadoop config.
>> 3 nodes is a little light for HDFS... With r=3, there is are no spares.

View raw message