kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kunicki <a...@streamsets.com>
Subject Re: What is the best way to write Kafka data into HDFS?
Date Thu, 11 Feb 2016 02:50:29 GMT
If you're looking for a lightweight solution with a friendly GUI (and fully
open source) check out streamsets.com
<https://mailtrack.io/trace/link/8413e80062fe60ea36185e68e2c54fe655621683?url=http%3A%2F%2Fstreamsets.com&signature=f9dc2333185869a9>
.
It supports writing messages to a parameterized directory hierarchy (e.g.
partitioned hive tables), support for late records if your template happens
to involve date/time variables.
How many messages per file and maximum file size are also fully
configurable.

Full Disclosure: I'm an engineer actively working on the project.

-Adam

On Wed, Feb 10, 2016 at 5:09 PM, R P <hadooper@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


-- 
Adam Kunicki
StreamSets | Field Engineer
mobile: 415.890.DATA (3282) | linkedin
<https://mailtrack.io/trace/link/ca71d99cbd53c90aa486d53a89ae27b424435e40?url=http%3A%2F%2Fwww.adamkunicki.com&signature=b0e94e141f13a326>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message