hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mat Hofschen <hofsc...@gmail.com>
Subject Re: Hbck errors in 0.90.3
Date Thu, 07 Jul 2011 09:39:38 GMT
Hi Stack,

looking at the old 0.20.4 cluster the parent region is not written to any
more. (no data on filesystem) In META table I can not identify that this
parent region is offlined though. Where can I find that key? Why is the
region not being written to any more if there is no offline flag set?

So by copying over the data to new cloud and using add_table script the
information that the region was offlined is lost. I guess this is one of the
problems of using copy on dfs level.
The new cluster is therefore inconsitent at this point with data written to
the parent region and not the child regions.

Basically we are trying to accomplish a "blue-green" migration. Once the new
cloud is proven to be stable we will switch of the old cloud. In the
meantime though we need to write all data to both clouds. And therefore we
need to have a defined starting point with the data from old cloud copied
somehow to new cloud with minimum downtime. (we are using the mozilla
approach to copy over).

Is there a way to reuse the META table from old cloud and avoid using the
add_table script?

Thanks for your help
Matthias


On Wed, Jul 6, 2011 at 9:37 PM, Stack <stack@duboce.net> wrote:

> On Wed, Jul 6, 2011 at 8:22 AM, Mat Hofschen <hofschen@gmail.com> wrote:
> > With hbck there are a few errors (52). Now I am wondering how to fix
> these.
> > For example hbck complains about two regions starting with the same key.
> > Next it complains about 2 regions overlapping. From looking at the META
> > table there seems to be a "parent region" and two "child regions from a
> > split". All three regions are registered, producing the two errros.
> > I examined the old 0.20.4 cluster META table, and it has exactly the same
> > problem (only there is no hbck to output the error).
> > So I am assuming that a split on 0.20.4 somehow got into trouble and
> > produced this error.
> >
> > How would I go about fixing these problems. I tried to use Merge but got
> an
> > NPE.
> >
>
> So the parent is not offline?  You can tell a region is offline by
> fetching it from .META. in the shell and look for the 'offline'
> attribute.
>
> hbase> get '.META.', 'ROW_OF_PARENT_REGION_IN_META'
>
> ... or just scan .META. and find your region.
>
> If you look at the daughters, do they have any content (Check
> filesystem... look for files)?  I'd think not since we'll be returning
> the parent as the place to write when we look for which region to
> insert into (I suppose the daughters could have data in memory but
> unlikely if we are returning the parent region as place for clients to
> write).
>
> If daughters have no data, remove them from .META. and from filesystem.
>
> hbase> deleteall '.META.', 'DAUGHTER1_IN_META'
> hbase> deleteall '.META.', DAUGHTER2_IN_META'
>
> That should take care of that overlap.
>
> Yes, probably an incomplete split over 0.20.4
>
> (I can't believe how many folks ran 0.20.4; it had a serious deadlock
> issue that seemed easy to trigger at least on this end!)
>
> > Also what happens to a write operation that adds a key that would fit
> into 2
> > regions. Into which regions is the key actually inserted. Would it pick
> the
> > first matching region found in META? Then I am probably in trouble
> because
> > all three regions contain valid data.
> >
>
>
> It'd likely go into the first.
>
> How do you figure all regions have valid data?
>
>
> > One more question: How does HBase mark regions as offline in META, for
> > example if a split has occured but the parent is still not removed?
> >
>
> See above.  You'll see the 'offline' attribute if region is offline
> (note, we do not show an 'online' attribute in shell if region is
> 'online').
>
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message