hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Whiting <je...@qualtrics.com>
Subject Re: data loss due to regionserver going down
Date Wed, 27 Jul 2011 23:33:38 GMT
Replication needs to be higher than 1. If you have a node which is running both DataNode and
HRegionServer then shut it down you WILL loose all the data that the DataNode was holding
because no
one else on the cluster has it. HBase relies on HDFS for the replication of data and does
NOT have
it's own data replication mechanism unlike Cassandra or Voldemort. If you set the HDFS replication
factor to 3 then when you shutdown your node 2 other nodes will have the data and HBase will
be able
to serve that data for you.

You can think of each DataNode as a hard drive. Having a replication factor of 1 means the
data is
only on one hard drive and if you unplug the hard drive that data will be lost. Having a replication
factor greater than 1 is like having multiple hard drives in a raid 1 (mirrored) array. If
unplug one of the hard drives the data is still on the other ones and nothing is lost.


On 7/27/2011 10:35 AM, 吴限 wrote:
> Here is my hbase-site.xml:
> configuration>
>     <property>
>         <name>hbase.cluster.distributed</name>
>         <value>true</value>
>     </property>
>     <property>
>         <name>hbase.rootdir</name>
>         <value>hdfs://server3.yun.com:54310/hbase</value>
>         <description>The directory shared by region servers.
>         </description>
>     </property>
>     <property>
>         <name>hbase.zookeeper.quorum</name>
>         <value>server3.yun.com</value>
>     </property>
>     <property>
>         <name>dfs.replication</name>
>         <value>1</value>
>     </property>
> 2011/7/28 Stack <stack@duboce.net>
>> On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <infinity0222@gmail.com> wrote:
>>> Setup:
>>>   -cdh3u0
>>>   - Hadoop 0.20.2
>> You are using the hadoop from cdh3u0?
>>>   - dfs.replication is set to 1
>> You will lose data if a machine goes away. You have two machines but
>> only one instance of each data block; think of it as half of your data
>> one one node and the rest on another.  If you kill one machine, half
>> your data is gone.
>>>  After I restarted the regionserver which I had rebooted and checked
>> again,
>>>  I found that some of the missing data was got back but there still
>> existed
>>> some data which hadn't been found yet.
>> I wonder what was going on here that we didn't see it all restored.
>>>  This is problematic since we are supposed to
>>> replicate at x1, so at least one other node should be able to
>>> theoretically serve the *data* that the downed regionserver can't.
>> No.  The behavior you describe would come with replication of 2, not 1.
>> St.Ack

Jeff Whiting
Qualtrics Senior Software Engineer

View raw message