hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop Sam John <anoo...@huawei.com>
Subject RE: Bulk Loading - LoadIncrementalHFiles
Date Fri, 02 Nov 2012 03:55:55 GMT
     Yes while doing the bulk load the table can be presplit. It will have the same number
of reducers as that of the region. One per region. Each HFile that the reducer generates will
be having a max size of HFile max size configuration. 
You can see that while bulk loading also there will be splits on the HFiles if needed (as
per the new splits which may happen on the regions)
Yes in case of table being not splits, later it will lead to splits...

Better way would be to do presplit I would say.

From: Amit Sela [amits@infolinks.com]
Sent: Thursday, November 01, 2012 10:33 PM
To: user@hbase.apache.org
Subject: Bulk Loading - LoadIncrementalHFiles

Hi everyone,

I'm using MR to bulk load into HBase by
using HFileOutputFormat.configureIncrementalLoad and after the job is
complete I use loadIncrementalHFiles.doBulkLoad

>From what I see, the MR outputs a file for each CF written and to my
understanding these files are loaded as store files into a region.

What I don't understand is *how many regions will open* ? and *how is that
determined *?
If I have 3 CF's and a lot of data to load, does that mean 3 large store
files will load into 1 region (more ?) and this region will split on major
compaction ?

Can I pre-create regions and tell the bulk load to split the data between
them during the load ?

In general, if someone could elaborate about LoadIncrementalHFiles it would
save me a lot of time diving into it.

Another question I is about running over values, is it possible to load an
updated value ? or generally updating columns and values for an existing
key ?
I'd think that there's no problem but when I try to run the same bulk load
twice (MR and then load) with the same data, the second time fails.
Right after mapreduce.LoadIncrementalHFiles: Trying to load hfile=........
I get: ERROR mapreduce.LoadIncrementalHFiles: Unexpected execution
exception during splitting...

View raw message