sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gwen Shapira (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1414) Add support for Import from Kafka
Date Thu, 07 Aug 2014 02:09:11 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088664#comment-14088664

Gwen Shapira commented on SQOOP-1414:

The planned syntax will be:

sqoop import --connection kafka:broker://broker_host:broker_port --table-name topic

I currently plan to implement:

First phase:
- No HBase, no Accumulo (A streaming solution makes more sense there)
- Assuming data in Kafka is String
- Single broker in connect string
- Exactly once semantics (using SimpleConsumer, checkpointing reads to HDFS)
- Limited to a single topic per Sqoop job
- Mapper per partition (no user control on number of mappers)

TBD later (possibly only on Sqoop2):
- Avro / Paruqet (probably via Kite)
- Hive / HCat integration
- Pluggable Decoder
- Specify number of mappers
- List of brokers
- List of topics

> Add support for Import from Kafka 
> ----------------------------------
>                 Key: SQOOP-1414
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1414
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.4.4
>            Reporter: Gwen Shapira
>            Assignee: Gwen Shapira
> Kafka is an important data source for many organizations.  
> Support in Sqoop will allow users to easily run MapReduce jobs to read data from Kafka
topics to HDFS in various formats and to integrate with Hive.

This message was sent by Atlassian JIRA

View raw message