hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Guba <ng...@mac.com>
Subject Re: data loss due to regionserver going down
Date Thu, 28 Jul 2011 05:50:38 GMT
Very interesting.  What is a good value where there is not too much of a trade-off in performance?


I'd imagine that setting this too high could create a very 'chatty' cluster.

On 28 Jul 2011, at 00:33, Jeff Whiting wrote:

> Replication needs to be higher than 1. If you have a node which is running both DataNode
and
> HRegionServer then shut it down you WILL loose all the data that the DataNode was holding
because no
> one else on the cluster has it. HBase relies on HDFS for the replication of data and
does NOT have
> it's own data replication mechanism unlike Cassandra or Voldemort. If you set the HDFS
replication
> factor to 3 then when you shutdown your node 2 other nodes will have the data and HBase
will be able
> to serve that data for you.
> 
> You can think of each DataNode as a hard drive. Having a replication factor of 1 means
the data is
> only on one hard drive and if you unplug the hard drive that data will be lost. Having
a replication
> factor greater than 1 is like having multiple hard drives in a raid 1 (mirrored) array.
If you
> unplug one of the hard drives the data is still on the other ones and nothing is lost.
> 
> ~Jeff
> 
> On 7/27/2011 10:35 AM, 吴限 wrote:
>> Here is my hbase-site.xml:
>> configuration>
>>    <property>
>>        <name>hbase.cluster.distributed</name>
>>        <value>true</value>
>>    </property>
>>    <property>
>>        <name>hbase.rootdir</name>
>>        <value>hdfs://server3.yun.com:54310/hbase</value>
>>        <description>The directory shared by region servers.
>>        </description>
>>    </property>
>>    <property>
>>        <name>hbase.zookeeper.quorum</name>
>>        <value>server3.yun.com</value>
>>    </property>
>>    <property>
>>        <name>dfs.replication</name>
>>        <value>1</value>
>>    </property>
>> 
>> 
>> 2011/7/28 Stack <stack@duboce.net>
>> 
>>> On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <infinity0222@gmail.com> wrote:
>>>> Setup:
>>>>  -cdh3u0
>>>>  - Hadoop 0.20.2
>>> You are using the hadoop from cdh3u0?
>>> 
>>> 
>>>>  - dfs.replication is set to 1
>>>> 
>>> You will lose data if a machine goes away. You have two machines but
>>> only one instance of each data block; think of it as half of your data
>>> one one node and the rest on another.  If you kill one machine, half
>>> your data is gone.
>>> 
>>> 
>>>> After I restarted the regionserver which I had rebooted and checked
>>> again,
>>>> I found that some of the missing data was got back but there still
>>> existed
>>>> some data which hadn't been found yet.
>>> 
>>> I wonder what was going on here that we didn't see it all restored.
>>> 
>>> 
>>>> This is problematic since we are supposed to
>>>> replicate at x1, so at least one other node should be able to
>>>> theoretically serve the *data* that the downed regionserver can't.
>>>> 
>>> No.  The behavior you describe would come with replication of 2, not 1.
>>> 
>>> St.Ack
>>> 
> 
> -- 
> Jeff Whiting
> Qualtrics Senior Software Engineer
> jeffw@qualtrics.com
> 


Mime
View raw message