hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vaibhav Puranik <vpura...@gmail.com>
Subject Re: A region full of data is missing
Date Sat, 14 Nov 2009 01:51:43 GMT
Stack,

We are on 0.20.0.

If this issue is not fixed in 0.20.1, then only I will file an issue. Let me
know.

Regards,
Vaibhav

On Fri, Nov 13, 2009 at 4:17 PM, stack <stack@duboce.net> wrote:

> On Fri, Nov 13, 2009 at 11:20 AM, Vaibhav Puranik <vpuranik@gmail.com
> >wrote:
>
> > Now that we have resolved this problem and figured out that some data
> could
> > be missing because of a region having a small empty file, we were
> wondering
> > if there is any automated way we can check all of our regions for this
> kind
> > of problem.
> >
> > Make an issue with the exception.  I thought we handled dirty files by
> logging them and skipping over but doesn't seem so.
>
>
>
> > One obvious way would be to check all the regions for a small (228 bytes)
> > file. But is there any other way or other approach to make sure that all
> of
> > our regions are intact? Should we be running a script periodically that
> > will
> > notify us whether all of our regions are intact or not?
> >
>
> A scan for a non-existent row will touch all in the table without returning
> any values.  If it can't load a region, it'll throw an exception.  You
> could
> look for the exception on a period?
>
> St.Ack
>
>
>
>
> >
> > Regards,
> > Vaibhav Puranik
> > Gumgum
> >
> >
> > On Tue, Nov 10, 2009 at 5:50 PM, Vaibhav Puranik <vpuranik@gmail.com>
> > wrote:
> >
> > > This problem is resolved. Courtesy  Ryan, JD and Stack.Thank you very
> > much!
> > >
> > > For the culprit region there were two data files instead of one data
> > file.
> > > The size of the first data file was around 130 MB. The second file was
> > just
> > > 228 bytes.
> > > Because of a bug this second file gets created during major compaction.
> > > That prevents the region from loading properly.
> > >
> > > As Ryan asked me to do, I deleted the smaller file, closed the region.
> > This
> > > time HBase reopened it properly and the missing data came back up. I
> > could
> > > access the missing data.
> > >
> > > I am not sure whether the bug is
> > > https://issues.apache.org/jira/browse/HBASE-1686 as this bug has  a
> > fixed
> > > version of 0.20 but we already have 0.20.0 deployed in production.
> > > But as per Ryan the root cause is the same.
> > >
> > > I guess we need to upgrade our HBase to 0.20.1!
> > >
> > > Thanks again,
> > > Vaibhav Puranik
> > > Gumgum
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Nov 10, 2009 at 2:48 PM, Vaibhav Puranik <vpuranik@gmail.com
> > >wrote:
> > >
> > >> Region name contains table name, start key and an id.
> > >> Start key is binary. In our case it was a mixture of few longs.
> Whenever
> > >> printed, it always prints Unicode characters which looks like a junk
> or
> > >> garbled characters. I am not sure whether shell can interpret it
> > correctly.
> > >>
> > >> I don't know how to give this name on the shell console hence I used
> the
> > >> HBaseAdmin method.
> > >>
> > >> I kept watching logs while I was doing it. The logs said it closed the
> > >> region and reopened it. It reopened it on the same region server.
> > >>
> > >> I tried accessing data after this, but it didn't work.
> > >>
> > >> .META. table seems to have its entry. The entry looks like:
> > >>
> > >>   column=historian:assignment, timestamp=1257889883623, value=Region
> > >> assigned to server
> > >> domU-12-32-38-01-24-F2.z-2.compute-1.internal,60020,1253581834090
> > >>
> > >>
> > >>  column=historian:open, timestamp=1257889886631, value=Region opened
> on
> > >> server :
> > >> domU-12-32-38-01-24-F2.z-2.compute-1.internal
> > >>
> > >>
> > >> column=info:regioninfo, timestamp=1250406167893, value=REGION => {NAME
> > =>
> > >> 'Visits
> > >>  \337\347\000\000\000\000\00
> > >>
> >
> ,\\x00\\x00\\x01\\x22\\xD2\\x1B\\xDF\\xE7\\x00\\x00\\x00\\x00\\x00\\x02\\xAF\\xFE
> > >>  0\002\257\376,1250406166412 ,1250406166412', STARTKEY =>
> > >> '\\x00\\x00\\x01\\x22\\xD2\\x1B\\xDF\\xE7\\x00\\x00\
> > >>                              \x00\\x00\\x00\\x02\\xAF\\xFE', ENDKEY =>
> > >> '\\x00\\x00\\x01\\x22\\xFC\\x27\\x0F8\\
> > >>                              x00\\x00\\x00\\x00\\x00\\x05X:', ENCODED
> =>
> > >> 1887697866, TABLE => {{NAME => 'Visit
> > >>                              s', FAMILIES => [{NAME => 'data',
> VERSIONS
> > =>
> > >> '3', COMPRESSION => 'NONE', TTL =>
> > >>                              '2147483647', BLOCKSIZE => '65536',
> > IN_MEMORY
> > >> => 'false', BLOCKCACHE => 'true'}]}
> > >>
> > >> }
> > >>
> > >>  column=info:server, timestamp=1257889886630, value=10.255.43.0:60020
> > >>
> > >>
> > >> column=info:serverstartcode, timestamp=1257889886630,
> > >> value=1253581834090
> > >>
> > >>
> > >> Regards,
> > >> Vaibhav
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Nov 10, 2009 at 2:28 PM, stack <stack@duboce.net> wrote:
> > >>
> > >>> You couldn't run the shell?
> > >>>
> > >>> So, region closed and opened somewhere else?  Open on another
> > >>> regionserver
> > >>> and you still can't get data out of it?
> > >>>
> > >>> St.Ack
> > >>>
> > >>>
> > >>> On Tue, Nov 10, 2009 at 2:11 PM, Vaibhav Puranik <vpuranik@gmail.com
> >
> > >>> wrote:
> > >>>
> > >>> > Stack,
> > >>> >
> > >>> > I tried doing HBaseAdmin.closeRegion with the binary region name.
> > >>> >
> > >>> > It closed the region and reopened it. But we still can not access
> the
> > >>> data.
> > >>> >
> > >>> > I guess trying to read it back from the data file is the only
> option
> > >>> left,
> > >>> > right?
> > >>> >
> > >>> > Regards,
> > >>> > Vaibhav
> > >>> >
> > >>> > On Tue, Nov 10, 2009 at 12:56 PM, stack <stack@duboce.net>
wrote:
> > >>> >
> > >>> > > On Mon, Nov 9, 2009 at 6:40 PM, Vaibhav Puranik <
> > vpuranik@gmail.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > >  Does that mean the region is
> > >>> > > > open and needs to be closed?
> > >>> > > >
> > >>> > > > It means region should be open... especially if its
the message
> > the
> > >>> > > regionserver is passing back to the Master reporting successful
> > open.
> > >>> > >  Maybe
> > >>> > > check the regionserver log to see if anything happened with
the
> > >>> region
> > >>> > > subsequently?
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> > > > All the other regions seems to have one file in their
data
> > >>> directory.
> > >>> > > This
> > >>> > > > region has two files in its data directory.
> > >>> > > > Is that right?
> > >>> > > >
> > >>> > >
> > >>> > > Over time, varies.  These are the files that carry the data.
>  When
> > >>> number
> > >>> > > hits a threshold, they are compacted into one file.
> > >>> > >
> > >>> > > So, did close work?
> > >>> > >
> > >>> > > If not, you can find the region in the fileystem?  If so,
if any
> > good
> > >>> w/
> > >>> > > ruby, see the add_table.rb script in head of the 0.20 branch.
>  See
> > >>> how it
> > >>> > > can read a region and add an entry for it to .META.  You
might be
> > >>> able to
> > >>> > > hack it up to do the one region if the close doesn't work.
> > >>> > >
> > >>> > > St.Ack
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message