kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kunicki <a...@streamsets.com>
Subject Re: What is the best way to write Kafka data into HDFS?
Date Thu, 11 Feb 2016 02:50:29 GMT
If you're looking for a lightweight solution with a friendly GUI (and fully
open source) check out streamsets.com
It supports writing messages to a parameterized directory hierarchy (e.g.
partitioned hive tables), support for late records if your template happens
to involve date/time variables.
How many messages per file and maximum file size are also fully

Full Disclosure: I'm an engineer actively working on the project.


On Wed, Feb 10, 2016 at 5:09 PM, R P <hadooper@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
> What are some best practices and options to write data from Kafka to HDFS?
> Thanks,
> R P

Adam Kunicki
StreamSets | Field Engineer
mobile: 415.890.DATA (3282) | linkedin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message