hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: hadoop without append in the absence of puts
Date Wed, 22 Jun 2011 23:36:06 GMT
> From: Andreas Neumann <neunand@gmail.com>
> If we only load data in bulk (that is, via doBulkLoad(), not using
> TableOutputFormat), do we still risk data loss? My understanding is
> that append is needed for the WAL, and the WAL is needed only
> for puts. But bulk loads bypass the WAL.


If you are doing read-only serving of HFiles built by MR and loaded by doBulkLoad, then you
would not need append support.

If adding new data to tables via the HBase API, then sooner or later this will change table
structure, which is recorded via Puts to META, which is self-hosted. Circumstances where those
edits can be lost without working append support in HDFS may be rare but not rare enough in
my estimation. Losing edits to META is bad. This can lead to missing regions and hung clients.
Human intervention will be necessary and the time scale for administrative recovery is usually
an availability problem.

> For instance, when a region is split, the master must write
> the new meta data to the meta regions. Would that require a WAL
> or rely on append in some other way?

See above.

> Are there other situations where the WAL is needed (or append
> is needed) to avoid data loss?

Deletes? Increments? For these operations you would not lose data per se if you don't have
append support, but the client may be incorrectly led to believe they were successfully applied
under the same low probability failure conditions that can corrupt META.

  - Andy

View raw message