hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Does HBase bulkload support Incremental data?
Date Tue, 27 Mar 2018 14:42:25 GMT
Yes, you can bulk load into a table which already contains data.

The ideal case is that you generate HFiles which map exactly to the 
distribution of Regions on your HBase cluster. However, given that we 
know that Region boundaries can change, the bulk load client 
(LoadIncrementalHFiles) has the ability to handle HFiles which no longer 
fit into a single Region. This is done client-side and then the 
resulting files are automatically resubmitted.

Beware: this is a very expensive and slow process (e.g. consider how 
long it would take to rewrite 100GB of data in a single process because 
you did not use the correct Region split points when creating the data). 
Most bulk loading issues I encounter are related to incorrect split 
points being used which causes the bulk load process to take hours to 
days to complete (instead of seconds to minutes).

On 3/27/18 9:15 AM, Jone Zhang wrote:
> Does HBase bulkload support Incremental data?
> How does it work if the incremental data key-range overlap with the data
> already exists?
> 
> Thanks
> 

Mime
View raw message