hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Recovering HBase after HDFS Corruption
Date Wed, 31 Dec 2008 09:28:34 GMT
> From: g00dn3ss <g00dn3ss@gmail.com>
> I guess it's missing some important file that I deleted
> when doing my fsck. I guess HBase also has problems if
> either of the data or index files is missing for a MapFile?

If the data file for a MapFile is gone, recovery is not
possible. If the index file is missing, it should be
regenerated on deployment. There may be a bug that prevents
this in some cases, but that will be resolved very soon I
am confident. 

> I have a more general question about the HBase
> architecture.  It seems like HBase is deleting and
> rewriting large portions of the table's data.  This
> seems to introduce a reliability concern that multiplies
> any concerns about the reliability of the DFS itself.

One of the fundamental insights that underlie mapreduce and
bigtable clones such as Hadoop and HBase -- (Hadoop DFS can
be considered a very loosely structured database) -- is 
that with modern hardware seek times dominate for updating
very large data sets. It is much more efficient to log 
updates and then periodically merge them with the initial
data  -- rewriting the whole database -- then it is to use
a index such as a B-tree and seek all over the place
attempting to record the same scale of updates as a set of
point writes.

Doug Cutting has a set of slides that shows an example use
case where 1 day is required only to fully update a
database by compaction/rewrite while a traditional RDMBS
would instead require 1,000 days to update the same via
seek and replace.

Hope this helps,

   - Andy


View raw message