hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Hendrey" <ghend...@decarta.com>
Subject Re: PENDING_CLOSE for too long
Date Sat, 29 Oct 2011 21:35:34 GMT
Sure. I posted the code many weeks back for a tool that will repair holes in .mETA.

If you do a check on the list, you should find it. I'll send you the latest code for that.
Maybe I made some fixes after I posted the code. Please ping me if I forget. I've used it
to repair huge tables  (and fixed subtle bugs in the process) so I'm confident it works.

No matter what anyone tells me, I know hbase is horribly broken for the use case of doing
bulk writes from an mr job. It shits the bed every time you pass a certain scale. For this
reason we've completely rewritten our code so that we use bulkloading. It's way more efficient
and always work.

Please ping me until I send you the code. Otherwise I will forget. 

Sent from my iPhone

On Oct 29, 2011, at 1:39 PM, "Stuart Smith" <stu24mail@yahoo.com> wrote:

> Hello Geoff,
>   I usually don't show up here, since I use CDH, and good form means I should stay on
> But!
>   I've been seeing the same issues for months:
>  - PENDING_CLOSE too long, master tries to reassign - I see an continuous stream of these.
>  - WrongRegionExceptions due to overlapping regions & holes in the regions.
> I just spent all day yesterday cribbing off of St.Ack's check_meta.rb script to write
a java program to fix up overlaps & holes in an offline fashion (hbase down, directly
on hdfs), and will start testing next week (cross my fingers!).
> It seems like the pending close messages can be ignored?
> And once I test my tool, and confirm I know a little bit about what I'm doing, maybe
we could share notes?
> Take care,
>   -stu
> ________________________________
> From: Geoff Hendrey <ghendrey@decarta.com>
> To: user@hbase.apache.org
> Cc: hbase-user@hadoop.apache.org
> Sent: Saturday, September 3, 2011 12:11 AM
> Subject: RE: PENDING_CLOSE for too long
> "Are you having trouble getting to any of your data out in tables?"
> depends what you mean. We see corruptions from time to time that prevent
> us from getting data, one way or another. Today's corruption was regions
> with duplicate start and end rows. We fixed that by deleting the
> offending regions from HDFS, and running add_table.rb to restore the
> meta. The other common corruption is the holes in ".META." that we
> repair with a little tool we wrote. We'd love to learn why we see these
> corruptions with such regularity (seemingly much higher than others on
> the list).
> We will implement timeout you suggest, and see how it goes.
> Thanks,
> Geoff
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> Sent: Friday, September 02, 2011 10:51 PM
> To: user@hbase.apache.org
> Cc: hbase-user@hadoop.apache.org
> Subject: Re: PENDING_CLOSE for too long
> Are you having trouble getting to any of your data out in tables?
> To get rid of them, try restarting your master.
> Before you restart your master, do "HBASE-4126  Make timeoutmonitor
> timeout after 30 minutes instead of 3"; i.e. set
> "hbase.master.assignment.timeoutmonitor.timeout" to 1800000 in
> hbase-site.xml.
> St.Ack
> On Fri, Sep 2, 2011 at 1:40 PM, Geoff Hendrey <ghendrey@decarta.com>
> wrote:
> > In the master logs, I am seeing "regions in transition timed out" and
> > "region has been PENDING_CLOSE for too long, running forced unasign".
> > Both of these log messages occur at INFO level, so I assume they are
> > innocuous. Should I be concerned?
> >
> >
> >
> > -geoff
> >
> >

  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message