spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: HBaseContext with Spark
Date Sat, 28 Jan 2017 07:51:16 GMT
storage handler bulk load:

SET hive.hbase.bulk=true;
INSERT OVERWRITE TABLE users SELECT … ;
But for now, you have to do some work and issue multiple Hive commands
Sample source data for range partitioning
Save sampling results to a file
Run CLUSTER BY query using HiveHFileOutputFormat and TotalOrderPartitioner
(sorts data, producing a large number of region files)
Import HFiles into HBase
HBase can merge files if necessary

On Sat, Jan 28, 2017 at 11:32 AM, Chetan Khatri <chetan.opensource@gmail.com
> wrote:

> @Ted, I dont think so.
>
> On Thu, Jan 26, 2017 at 6:35 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Does the storage handler provide bulk load capability ?
>>
>> Cheers
>>
>> On Jan 25, 2017, at 3:39 AM, Amrit Jangid <amrit.jangid@goibibo.com>
>> wrote:
>>
>> Hi chetan,
>>
>> If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE
>> with
>>
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.
>>
>>
>> Try this if you problem can be solved
>>
>>
>> https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
>>
>>
>> Regards
>>
>> Amrit
>>
>>
>> .
>>
>> On Wed, Jan 25, 2017 at 5:02 PM, Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Hello Spark Community Folks,
>>>
>>> Currently I am using HBase 1.2.4 and Hive 1.2.1, I am looking for Bulk
>>> Load from Hbase to Hive.
>>>
>>> I have seen couple of good example at HBase Github Repo:
>>> https://github.com/apache/hbase/tree/master/hbase-spark
>>>
>>> If I would like to use HBaseContext with HBase 1.2.4, how it can be done
>>> ? Or which version of HBase has more stability with HBaseContext ?
>>>
>>> Thanks.
>>>
>>
>>
>>
>>
>>
>

Mime
View raw message