sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinoth Chandar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1744) TO-side: Write data to HBase
Date Tue, 02 Dec 2014 18:56:12 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231912#comment-14231912

Vinoth Chandar commented on SQOOP-1744:

>> Can you put any bounds on what records might change?
We have our own usage patterns. But, I don't think we can expect only records in the last
5 minutes to change, even for typical OLTP workloads, right (eg: Uber table, profile table,

>> Actually, we can select a subset of the records in HBase and copy them to Parquet
Not sure I explained myself clearly... let me take another shot.. 

Once we do a full fetch, we could do something to like below, for the subsequent incremental
fetch :
(Assume : We did a select * from users; and  produced a number parquet files that contain
records from a User table, rows organized by the table pk userid)

1) Obtain all rows that changes since last run.
2) Write those rows into HBase to merge. 
3) Then pull them out again & rewrite the affected parquet files. 

But, in this step 2 does not buy us anything, right? Since we stiil need to do the work of
identifying the affected parquet files and overwrite only those affected. Thats why I was
saying, only if you convert the whole dataset from HFile to parquet, you get an out-of-the-box

May be I am missing something? 

> TO-side: Write data to HBase
> ----------------------------
>                 Key: SQOOP-1744
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1744
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>             Fix For: 1.99.5
> Propose to write data into HBase. Note that different to HDFS, HBase is append only.
Merge does not work for HBase.

This message was sent by Atlassian JIRA

View raw message