spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Approach: Incremental data load from HBASE
Date Wed, 04 Jan 2017 11:37:45 GMT
Ted Yu,

You understood wrong, i said Incremental load from HBase to Hive,
individually you can say Incremental Import from HBase.

On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Incremental load traditionally means generating hfiles and
> using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load the
> data into hbase.
>
> For your use case, the producer needs to find rows where the flag is 0 or
> 1.
> After such rows are obtained, it is up to you how the result of processing
> is delivered to hbase.
>
> Cheers
>
> On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri <
> chetan.opensource@gmail.com> wrote:
>
>> Ok, Sure will ask.
>>
>> But what would be generic best practice solution for Incremental load
>> from HBASE.
>>
>> On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> I haven't used Gobblin.
>>> You can consider asking Gobblin mailing list of the first option.
>>>
>>> The second option would work.
>>>
>>>
>>> On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri <
>>> chetan.opensource@gmail.com> wrote:
>>>
>>>> Hello Guys,
>>>>
>>>> I would like to understand different approach for Distributed
>>>> Incremental load from HBase, Is there any *tool / incubactor tool* which
>>>> satisfy requirement ?
>>>>
>>>> *Approach 1:*
>>>>
>>>> Write Kafka Producer and maintain manually column flag for events and
>>>> ingest it with Linkedin Gobblin to HDFS / S3.
>>>>
>>>> *Approach 2:*
>>>>
>>>> Run Scheduled Spark Job - Read from HBase and do transformations and
>>>> maintain flag column at HBase Level.
>>>>
>>>> In above both approach, I need to maintain column level flags. such as
>>>> 0 - by default, 1-sent,2-sent and acknowledged. So next time Producer will
>>>> take another 1000 rows of batch where flag is 0 or 1.
>>>>
>>>> I am looking for best practice approach with any distributed tool.
>>>>
>>>> Thanks.
>>>>
>>>> - Chetan Khatri
>>>>
>>>
>>>
>>
>

Mime
View raw message