hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Bulk Loading - LoadIncrementalHFiles
Date Thu, 01 Nov 2012 17:03:34 GMT
Hi everyone,

I'm using MR to bulk load into HBase by
using HFileOutputFormat.configureIncrementalLoad and after the job is
complete I use loadIncrementalHFiles.doBulkLoad

>From what I see, the MR outputs a file for each CF written and to my
understanding these files are loaded as store files into a region.

What I don't understand is *how many regions will open* ? and *how is that
determined *?
If I have 3 CF's and a lot of data to load, does that mean 3 large store
files will load into 1 region (more ?) and this region will split on major
compaction ?

Can I pre-create regions and tell the bulk load to split the data between
them during the load ?

In general, if someone could elaborate about LoadIncrementalHFiles it would
save me a lot of time diving into it.

Another question I is about running over values, is it possible to load an
updated value ? or generally updating columns and values for an existing
key ?
I'd think that there's no problem but when I try to run the same bulk load
twice (MR and then load) with the same data, the second time fails.
Right after mapreduce.LoadIncrementalHFiles: Trying to load hfile=........
I get: ERROR mapreduce.LoadIncrementalHFiles: Unexpected execution
exception during splitting...


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message