hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Recovering hbase after a failure
Date Thu, 02 Oct 2014 18:26:18 GMT
​On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org> wrote:

> Also, once the original /hbase got mv'd, a few of the region servers did
> some flush's before they aborted.   Those RS's actually created a new
> /hbase, with new table directories, but only containing the data from the
> flush.


Sounds like we should be creating flush files with createNonRecursive (even
though it's deprecated)


On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org> wrote:

> FWIW, in case something like this happens to someone else.
>
> To recover this, the first thing I tried was to just mv the /hbase
> directory back.   That doesn’t work.
>
> To get back going had to completely shut down and restart.
>
> Also, once the original /hbase got mv'd, a few of the region servers did
> some flush's before they aborted.   Those RS's actually created a new
> /hbase, with new table directories, but only containing the data from the
> flush.
>
>
> -----Original Message-----
> From: Buckley,Ron
> Sent: Thursday, October 02, 2014 2:09 PM
> To: hbase-user
> Subject: RE: Recovering hbase after a failure
>
> Nick,
>
> Good ideas.    Compared  file and region counts with our DR site.   Things
> looks OK.  Going to run some rowcounter's too.
>
> Feels like we got off easy.
>
> Ron
>
> -----Original Message-----
> From: Nick Dimiduk [mailto:ndimiduk@gmail.com]
> Sent: Thursday, October 02, 2014 1:27 PM
> To: hbase-user
> Subject: Re: Recovering hbase after a failure
>
> Hi Ron,
>
> Yikes!
>
> Do you have any basic metrics regarding the amount of data in the system
> -- size of store files before the incident, number of records, &c?
>
> You could sift through the HDFS audit log and see if any files that were
> there previously have not been restored.
>
> -n
>
> On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckleyr@oclc.org> wrote:
>
> > We just had an event where, on our main hbase instance, the /hbase
> > directory got moved out from under the running system (Human error).
> >
> > HBase was really unhappy about that, but we were able to recover it
> > fairly easily and get back going.
> >
> > As far as I can tell, all the data and tables came back correct. But,
> > I'm pretty concerned that there may be some hidden corruption or data
> loss.
> >
> > 'hbase hbck'  runs clean and there are no new complaints in the logs.
> >
> > Can anyone think of anything else we should look at?
> >
> >
> >
> >
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message