sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1744) TO-side: Write data to HBase
Date Tue, 02 Dec 2014 18:26:12 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231872#comment-14231872

Ryan Blue commented on SQOOP-1744:

[~vinothchandar], there are a lot of trade-offs to get your data into Parquet format when
records might be changed. Can you put any bounds on what records might change? If so, we have
a lot more options. For example, if only records created in the last 5 minutes might receive
updates, then we can keep those in an HBase table and copy 5-minute windows from it once we
know that the records aren't going to change.

bq. we can only convert the whole data set in HBase to Parquet everytime, as I understand

Actually, we can select a subset of the records in HBase and copy them to Parquet. One big
concern is having enough data, though. We generally want to avoid small Parquet files.

> TO-side: Write data to HBase
> ----------------------------
>                 Key: SQOOP-1744
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1744
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>             Fix For: 1.99.5
> Propose to write data into HBase. Note that different to HDFS, HBase is append only.
Merge does not work for HBase.

This message was sent by Atlassian JIRA

View raw message