sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gwen Shapira (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1414) Add support for Import from Kafka
Date Thu, 07 Aug 2014 02:09:11 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088664#comment-14088664
] 

Gwen Shapira commented on SQOOP-1414:
-------------------------------------

The planned syntax will be:

sqoop import --connection kafka:broker://broker_host:broker_port --table-name topic

I currently plan to implement:

First phase:
- No HBase, no Accumulo (A streaming solution makes more sense there)
- Assuming data in Kafka is String
- Single broker in connect string
- Exactly once semantics (using SimpleConsumer, checkpointing reads to HDFS)
- Limited to a single topic per Sqoop job
- Mapper per partition (no user control on number of mappers)

TBD later (possibly only on Sqoop2):
- Avro / Paruqet (probably via Kite)
- Hive / HCat integration
- Pluggable Decoder
- Specify number of mappers
- List of brokers
- List of topics


> Add support for Import from Kafka 
> ----------------------------------
>
>                 Key: SQOOP-1414
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1414
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.4.4
>            Reporter: Gwen Shapira
>            Assignee: Gwen Shapira
>
> Kafka is an important data source for many organizations.  
> Support in Sqoop will allow users to easily run MapReduce jobs to read data from Kafka
topics to HDFS in various formats and to integrate with Hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message