sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1744) TO-side: Write data to HBase
Date Tue, 02 Dec 2014 18:26:12 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231872#comment-14231872
] 

Ryan Blue commented on SQOOP-1744:
----------------------------------

[~vinothchandar], there are a lot of trade-offs to get your data into Parquet format when
records might be changed. Can you put any bounds on what records might change? If so, we have
a lot more options. For example, if only records created in the last 5 minutes might receive
updates, then we can keep those in an HBase table and copy 5-minute windows from it once we
know that the records aren't going to change.

bq. we can only convert the whole data set in HBase to Parquet everytime, as I understand

Actually, we can select a subset of the records in HBase and copy them to Parquet. One big
concern is having enough data, though. We generally want to avoid small Parquet files.

> TO-side: Write data to HBase
> ----------------------------
>
>                 Key: SQOOP-1744
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1744
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>             Fix For: 1.99.5
>
>
> Propose to write data into HBase. Note that different to HDFS, HBase is append only.
Merge does not work for HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message