hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Koch <ogd...@googlemail.com>
Subject Re: Snapshot clone error
Date Fri, 31 Jan 2014 15:40:15 GMT
Thanks for your reply,

As a matter of fact when running with the "-files" option it turns out a
lot of files are missing from the snapshot which I did not manage to
restore. It's possible that hbck was run during snapshotting.

**************************************************************
BAD SNAPSHOT: 6659 hfile(s) and 0 log(s) missing.
**************************************************************
78 HFiles (78 in archive), total size 14.3 G (0.00% 0 shared with the
source table)
0 Logs, total size 0

78 files is exactly the number of regions that I found after attempting
restoration.

We followed standard procedure as described in the manual:
http://hbase.apache.org/book/ops.snapshots.html

I will try again and make sure no hbchk is intervening.

/David


On Fri, Jan 31, 2014 at 4:20 PM, Matteo Bertozzi <theo.bertozzi@gmail.com>wrote:

> you should use SnapshotInfo with the "-files" options and you'll probably
> see that one snapshot is corrupted.
> in HBase 0.94.15/CDH 4.6 there will be a fix (HBASE-10111) that will
> prevent to restore/clone a corrupted snapshot.
>
> a corrupted snapshot means that some file contained in the snapshot is
> missing from the .archive
> that situation may happen if you have removed files by hand, or you run
> hbck that sidelined the files or similar
> (unless there is a bug somewhere)
> do you remember the steps that you followed? did you use ExportSnapshot?
> did you moved the files by hand to another cluster or similar?
>
> Offline or Online snapshot shouldn't make difference, the corruption is
> probably happened after taking the snapshot.
> You can retry taking the snapshot, and periodically run SnapshotInfo with
> the -files options to verify the state and post the logs in case you get a
> corruption again.
>
> Matteo
>
>
>
> On Fri, Jan 31, 2014 at 3:10 PM, David Koch <ogdude@googlemail.com> wrote:
>
> > Matteo,
> >
> > Thank you for your reply. All clients, servers are using the same
> version:
> >
> > 14/01/31 16:06:20 INFO util.VersionInfo: HBase 0.94.6-cdh4.5.0
> >
> > Also, the information generated by:
> >
> > hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
> >
> > is identical for snapshots which I managed to clone and those for which
> the
> > cloning/restoration failed. Would you advise re-trying snaphotting the
> > table while it is disabled? Otherwise I'll go with old-fashioned
> CopyTable
> > or re-import into HBase from HDFS files.
> >
> > Thank you,
> >
> > /David
> >
> >
> > On Fri, Jan 31, 2014 at 2:42 PM, Matteo Bertozzi <
> theo.bertozzi@gmail.com
> > >wrote:
> >
> > > the snapshot seems to be corrupted, which version are you running?
> > >
> > > Matteo
> > >
> > >
> > >
> > > On Fri, Jan 31, 2014 at 1:06 PM, David Koch <ogdude@googlemail.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > We export an online snapshot of a table to a different cluster, when
> > > > attempting a clone on the destination cluster using:
> > > >
> > > > clone_snapshot 'table_source_snapshot', 'table_dest'
> > > >
> > > > it does not work.
> > > >
> > > > The operation times out after a a while
> > > >
> > > > ERROR: java.io.IOException: Table 'table_dest' not yet enabled, after
> > > > 1996939ms.
> > > >
> > > > and I see only a fraction of the number of regions in the destination
> > > > table. Table is indicated as "enabled" but I cannot perform any scans
> > on
> > > > it.
> > > >
> > > > The snapshot info returns the following:
> > > >
> > > > Snapshot Info
> > > > ----------------------------------------
> > > >    Name: table_source_snapshot
> > > >    Type: FLUSH
> > > >   Table: table_source
> > > >  Format: 0
> > > > Created: 2014-01-30T13:05:02
> > > >
> > > > Snapshot seems to be intact. What could be the error? Should I take
> an
> > > > offline snapshot instead? Going via restore/enable instead of clone
> > does
> > > > not seem to work either.
> > > >
> > > > Also, I see the following in the region servers:
> > > >
> > > > 2:24:35.807 PM ERROR
> > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler
> > > > Failed open of
> > > > region=
> > > >
> > >
> >
> table_source,\x82\x12Y\x00\xE98C\xEE\xBC\xCC\xE3h\xDAPt\xA6,1366259070788.63ca017ac7cd03e68c35a4da8b56421d.,
> > > > starting to roll back the global memstore size.
> > > > java.io.IOException: java.io.IOException:
> > java.io.FileNotFoundException:
> > > > Unable to open link: org.apache.hadoop.hbase.io.HFileLink
> > > >
> > > >
> > >
> >
> locations=[hdfs://nameservice1/hbase/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c,
> > > >
> > > >
> > >
> >
> hdfs://nameservice1/hbase/.tmp/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c,
> > > >
> > > >
> > >
> >
> hdfs://nameservice1/hbase/.archive/table_source/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c]
> > > >
> > > > None of these parts actually exist, however:
> > > >
> > > >
> > >
> >
> hdfs://nameservice1/hbase/.snapshot/table_source_snapshot/816bb88c6f3524a877f4cb7ce747fec1/t/c3b37dc11e684626a5b464a25a75735c
> > > > does exist.
> > > >
> > > > I don't think that's the issue though, since I applied the same steps
> > to
> > > a
> > > > smaller table and it worked.
> > > >
> > > > Any advice is appreciated,
> > > >
> > > > Regards,
> > > >
> > > > /David
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message