hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 吴限 <infinity0...@gmail.com>
Subject Re: data loss due to regionserver going down
Date Wed, 27 Jul 2011 16:46:44 GMT
Just by keep cheking http://master:60010.
Before Step 2 :
AddressStart CodeLoadserver4.yun.com:600301311785159202requests=0,
regions=10, usedHeap=32,
maxHeap=995server5.yun.com:600301311768553647requests=18,
regions=7, usedHeap=117, maxHeap=995Total:servers: 2 requests=18,
regions=17Then
at Step 2, I shut server4 and wait until the html shows like this:
AddressStart CodeLoad

server5.yun.com:600301311768553647requests=18, regions=17, usedHeap=117,
maxHeap=995Total:servers: 2 requests=18, regions=17then I continued the
following  steps..

在 2011年7月28日 上午12:40,Chris Tarnas <cft@email.com>写道:

> That is strange behavior. How long did you wait between Step 2 and 3, and
> what is the results of running
>
> hbase hbck
>
> at step 3?
>
> -chris
>
> On Jul 27, 2011, at 9:23 AM, 吴限 wrote:
>
> > Thx for your reply. But actually later I did another experiment similar
> to
> > one which I explained earlier.
> > Step 1: I inserted some data into the hbase.
> > Step 2: I shut one of the region servers.
> > Step 3 : I checked the table and found some data had been lost.
> > Step 4: I disabled the table and then enabled the table
> > Step 5 : I checked again and found nothing lost.
> >
> > If some data didn't exist in the other region server, then how can u
> explain
> > this?
> >
> > Hope to get ur reply.Thx~
> >
> > 2011/7/28 Chris Tarnas <cft@email.com>
> >
> >> Replication of 1x means no replication. 2x would mean the data exists in
> >> two locations (what it looks like you want). Running with a replication
> of
> >> 1x is a very bad idea and is pretty much a guaranteed way to get data
> loss.
> >>
> >> -chris
> >>
> >> On Jul 27, 2011, at 8:58 AM, 吴限 wrote:
> >>
> >>> Hi everyone. I'd like to run the following *data* *loss* scenario by
> you
> >> to
> >>> see if
> >>> we are doing something obviously wrong with our setup here.
> >>>
> >>> Setup:
> >>>  -cdh3u0
> >>>  - Hadoop 0.20.2
> >>>  - HBase 0.90.1
> >>>  - 1 Master Node running as NameNode & JobTracker
> >>>  -zookeeper quorum
> >>>  - 2 child nodes running as Datanode, TaskTracker and RegionServer each
> >>>  - dfs.replication is set to 1
> >>>
> >>> First, I inserted some data into the hbase a few hours ago.
> >>> Then after a while. I rebooted one of the region servers and waited
> until
> >>> the master responded to that. However, after I checked the table using
> >> hbase
> >>> shell (I used the "count" command), I noticed that there was a huge
> >> amount
> >>> of data being lost.
> >>> After I restarted the regionserver which I had rebooted and checked
> >> again,
> >>> I found that some of the missing data was got back but there still
> >> existed
> >>> some data which hadn't been found yet.
> >>> At last,after I disabled the table and then enabled the table , I found
> >> that
> >>> all data was stored in the cluster and there was no data that was lost.
> >>>
> >>> This is problematic since we are supposed to
> >>> replicate at x1, so at least one other node should be able to
> >>> theoretically serve the *data* that the downed regionserver can't.
> >>>
> >>> Questions:
> >>>
> >>>  - How can you guys explain this weird situation?
> >>>  - Are there way to recover such lost *data*?
> >>>
> >>> Any tips here are definitely appreciated. I'll be happy to provide more
> >>> information as well.-0
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message