hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Mains <andrew.ma...@kontagent.com>
Subject Re: Can TableSnapshotInputFormat support multiple snapshots as the MR input?
Date Sun, 31 May 2015 23:51:08 GMT
Hi Shaofeng,

Sorry about the delayed response; I was on vacation last week.

We (Upsight) are actually also on 0.98, and I have a version of the 
patch rebased against 0.98.12, which I'll upload to the JIRA ticket. 
We've had success with running just the patched hbase-server jar with 
our mapreduce jobs (deploying it without touching our server 
installations), so I imagine it should work for you as well (in 
particular if you happen to be already building/maintaining an HBase fork).

Let me know if you run into any issues!


On 5/22/15 8:11 PM, Shi, Shaofeng wrote:
> Hi Andrew, this is what we need, thank you! In which version will this
> feature be released? Our hbase is v0.98, is it possible that just patch
> this to get the feature?
> On 5/22/15, 6:06 PM, "Andrew Mains"<andrew.mains@kontagent.com>  wrote:
>> In the latest release, no; however I've filed a ticket here
>> https://issues.apache.org/jira/browse/HBASE-13356  for this feature, and
>> uploaded a patch for review.
>> The patch provides a MultiTableSnapshotInputFormat which can run a list
>> of scans over multiple snapshots. Jobs can be initialized using:
>>   public static void initMultiTableSnapshotMapperJob(Map<String,
>> Collection<Scan>> snapshotScans,
>>       Class<? extends TableMapper> mapper, Class<?> outputKeyClass,
>> Class<?> outputValueClass,
>>        Job job, boolean addDependencyJars, Path tmpRestoreDir) throws
>> IOException {
>> Hope this helps!
>> Andrew
>> On 5/22/15 2:35 AM, Shi, Shaofeng wrote:
>>> Hello,
>>> We have a scenario which need merge multiple Hbase tables into one
>>> table periodically; To gain better performance and minimal the impact to
>>> HBase server, we are evaluating the method of using
>>> TableSnapshotInputFormat
>>> (http://www.slideshare.net/enissoz/mapreduce-over-snapshots); But from
>>> the API we see it only allows one snapshot as input; Is it possible to
>>> change it to allow multiple snapshots?
>>> Thanks in advance for any advise;
>>> Shaofeng Shi
>>> Apache Kylin

View raw message