hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: bulk loading and RegionObservers
Date Thu, 12 Jan 2012 20:04:31 GMT
> I think that the people demanding such method of access would like to have the
> ability to trigger the action on a row level (so again when a Put with new
> values come). But I think that this would not scale - it would take a long time
> to scan the new region and fire prePut() call on RO for the new region?

CPs hook compaction by allowing one to wrap the scanner that is iterating over the store files.
So the wrapper gets a chance to examine the KeyValues being processed and also has an opportunity
to modify or drop them. 
 
Similarly for incoming HFiles for bulk load, the CP could be given a scanner iterating over
those files, if you had a RegionObserver installed. You would be given the option in effect
to rewrite the incoming HFiles before they are handed over to the RegionServer for addition
to the region.

This is the right approach to interface design here, IMO, because the fact you are given a
scanner highlights the bulk nature of the input.

Is this something you could use?


Best regards,


  - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Stanislav Barton <stanislav.barton@internetmemory.net>
> To: user@hbase.apache.org
> Cc: 
> Sent: Thursday, January 12, 2012 3:03 AM
> Subject: Re: bulk loading and RegionObservers
> 
> Andrew Purtell <apurtell@...> writes:
> 
>> 
>>  Yes this is correct.
>> 
>>  Coprocessors / RegionObservers and bulk loading have been developing
> separately in parallel. 
>> 
>>  Now that bulk loading changes are settling down, I've been considering 
> adding
> CP hooks into the bulk load
>>  process, at the HRegion level, without complicating atomicity. A simple and
> straightforward course of
>>  action is to give the CP the option of rewriting the submitted store 
> file(s)
> before the regionserver
>>  attempts to validate and move them into the store. This is similar to how 
> CPs
> are hooked into compaction.
>>  Would this be sufficient for what you want to do?
>>   
>>  Best regards,
>> 
>>         - Andy
>> 
>>  Problems worthy of attack prove their worth by hitting back. - Piet Hein 
> (via
> Tom White)
>> 
>>  >________________________________
>>  > From: Stanislav Barton <stanislav.barton <at> 
> internetmemory.net>
>>  >To: user@... 
>>  >Sent: Wednesday, January 11, 2012 6:47 AM
>>  >Subject: bulk loading and RegionObservers
>>  > 
>>  >Hello,
>>  >
>>  >I tried to find the information in the documentation but it is still
>>  >not clear to me. I do a lot of bulk loading using the MapReduce job
>>  >whose output is HFiles that are automatically loaded to HBase and I
>>  >was wondering whether this way (my guess is that it is so) I do bypass
>>  >the RegionObserver mechanisms. Meaning that such defined coprocessors
>>  >won't get fired up when the new data is loaded in HBase. Is my
>>  >assumption correct?
>>  >
>>  >Stan
>>  >
>>  >
>>  >
> 
> 
> I think that the people demanding such method of access would like to have the
> ability to trigger the action on a row level (so again when a Put with new
> values come). But I think that this would not scale - it would take a long time
> to scan the new region and fire prePut() call on RO for the new region? I have
> experience in doing 30GB bulk load steps to pre-splitted table in order to
> maintain highest throughput and diminish overhead as possible (on fairly small
> cluster (~10) of small machines). 
> 
> --
> 
> Stan
> 

Mime
View raw message