hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Recovering hbase after a failure
Date Thu, 02 Oct 2014 21:13:23 GMT
It's not the round trip, it's the atomicity of the operation. Consider a
rename happening between the isDirectory call and the subsequent create
call. What would you have achieved by introducing the isDirectory check? I
skimmed the FileSystem javadoc for 2.4.1 and none of the 13 non-deprecated
create methods can provide the same semantics of createNonRecursive, shame.


On Thu, Oct 2, 2014 at 11:36 AM, Esteban Gutierrez <esteban@cloudera.com>
wrote:

> I'm not sure if we should use the deprecated API, calling isDirectory
> shouldn't be that expensive in the NN but it will add another RPC call per
> flush.
>
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Thu, Oct 2, 2014 at 11:26 AM, Andrew Purtell <apurtell@apache.org>
> wrote:
>
> > ​On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org> wrote:
> >
> > > Also, once the original /hbase got mv'd, a few of the region servers
> did
> > > some flush's before they aborted.   Those RS's actually created a new
> > > /hbase, with new table directories, but only containing the data from
> the
> > > flush.
> >
> >
> > Sounds like we should be creating flush files with createNonRecursive
> (even
> > though it's deprecated)
> >
> >
> > On Thu, Oct 2, 2014 at 11:17 AM, Buckley,Ron <buckleyr@oclc.org> wrote:
> >
> > > FWIW, in case something like this happens to someone else.
> > >
> > > To recover this, the first thing I tried was to just mv the /hbase
> > > directory back.   That doesn’t work.
> > >
> > > To get back going had to completely shut down and restart.
> > >
> > > Also, once the original /hbase got mv'd, a few of the region servers
> did
> > > some flush's before they aborted.   Those RS's actually created a new
> > > /hbase, with new table directories, but only containing the data from
> the
> > > flush.
> > >
> > >
> > > -----Original Message-----
> > > From: Buckley,Ron
> > > Sent: Thursday, October 02, 2014 2:09 PM
> > > To: hbase-user
> > > Subject: RE: Recovering hbase after a failure
> > >
> > > Nick,
> > >
> > > Good ideas.    Compared  file and region counts with our DR site.
> >  Things
> > > looks OK.  Going to run some rowcounter's too.
> > >
> > > Feels like we got off easy.
> > >
> > > Ron
> > >
> > > -----Original Message-----
> > > From: Nick Dimiduk [mailto:ndimiduk@gmail.com]
> > > Sent: Thursday, October 02, 2014 1:27 PM
> > > To: hbase-user
> > > Subject: Re: Recovering hbase after a failure
> > >
> > > Hi Ron,
> > >
> > > Yikes!
> > >
> > > Do you have any basic metrics regarding the amount of data in the
> system
> > > -- size of store files before the incident, number of records, &c?
> > >
> > > You could sift through the HDFS audit log and see if any files that
> were
> > > there previously have not been restored.
> > >
> > > -n
> > >
> > > On Thu, Oct 2, 2014 at 10:18 AM, Buckley,Ron <buckleyr@oclc.org>
> wrote:
> > >
> > > > We just had an event where, on our main hbase instance, the /hbase
> > > > directory got moved out from under the running system (Human error).
> > > >
> > > > HBase was really unhappy about that, but we were able to recover it
> > > > fairly easily and get back going.
> > > >
> > > > As far as I can tell, all the data and tables came back correct. But,
> > > > I'm pretty concerned that there may be some hidden corruption or data
> > > loss.
> > > >
> > > > 'hbase hbck'  runs clean and there are no new complaints in the logs.
> > > >
> > > > Can anyone think of anything else we should look at?
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message