sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1744) TO-side: Write data to HBase
Date Mon, 01 Dec 2014 22:49:15 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230615#comment-14230615

Ryan Blue commented on SQOOP-1744:

I see that this is a subtask of "Kite Connector Support", so kindly ignore my last comment.

The reason why [~stanleyxu2005] notes that merge isn't supported is that merge is a HDFS dataset
concept. We can write to temporary locations and then merge the data in by moving files in
HDFS. That ensures that all of the data is written to HDFS before we commit all of it by moving
the files. For HBase, writes take place as soon as the data is sent to the server (usually
batched and sent when a flush occurs). We need to clearly define what should happen when a
job has failures.

If none of the data for that job should be in HBase, then we need to stage all of the data
and update HBase at once using [{{HFileOutputFormat}}|https://hbase.apache.org/book/arch.bulk.load.html].
I think this is the most reasonable approach, but it requires an update to Kite.

> TO-side: Write data to HBase
> ----------------------------
>                 Key: SQOOP-1744
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1744
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>             Fix For: 1.99.5
> Propose to write data into HBase. Note that different to HDFS, HBase is append only.
Merge does not work for HBase.

This message was sent by Atlassian JIRA

View raw message