hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Neumann <neun...@gmail.com>
Subject Re: hadoop without append in the absence of puts
Date Thu, 23 Jun 2011 01:45:04 GMT
Thanks Andy for the clear response.

We are indeed going to use bulk load only, and no puts, deletes or
increments. So the only puts we will have are those that are caused by
changes in the table structure. I guess that includes region splits but also
reassignment of a region after its region server died.

I agree that even though these are rare, they are not rare enough to take a
risk. But they could be rare enough to justify a less efficient
implementation of the WAL. Would it be reasonable to use an implementation
of HLog that - at the price of performance - persists the WAL to HDFS
without relying on append?

Cheers -Andreas.


On Wed, Jun 22, 2011 at 4:36 PM, Andrew Purtell <apurtell@apache.org> wrote:

> > From: Andreas Neumann <neunand@gmail.com>
> > If we only load data in bulk (that is, via doBulkLoad(), not using
> > TableOutputFormat), do we still risk data loss? My understanding is
> > that append is needed for the WAL, and the WAL is needed only
> > for puts. But bulk loads bypass the WAL.
>
> Correct.
>
> If you are doing read-only serving of HFiles built by MR and loaded by
> doBulkLoad, then you would not need append support.
>
> If adding new data to tables via the HBase API, then sooner or later this
> will change table structure, which is recorded via Puts to META, which is
> self-hosted. Circumstances where those edits can be lost without working
> append support in HDFS may be rare but not rare enough in my estimation.
> Losing edits to META is bad. This can lead to missing regions and hung
> clients. Human intervention will be necessary and the time scale for
> administrative recovery is usually an availability problem.
>
> > For instance, when a region is split, the master must write
> > the new meta data to the meta regions. Would that require a WAL
> > or rely on append in some other way?
>
> See above.
>
> > Are there other situations where the WAL is needed (or append
> > is needed) to avoid data loss?
>
> Deletes? Increments? For these operations you would not lose data per se if
> you don't have append support, but the client may be incorrectly led to
> believe they were successfully applied under the same low probability
> failure conditions that can corrupt META.
>
>  - Andy
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message