kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Conikee <coni...@gmail.com>
Subject Re: S3 Consumer
Date Sat, 29 Dec 2012 06:15:07 GMT
Noticed this s3 based consumer project on github 

On Dec 27, 2012, at 7:08 AM, David Arthur <mumrah@gmail.com> wrote:

> I don't think anything exists like this in Kafka (or contrib), but it would be a useful
addition! Personally, I have written this exact thing at previous jobs.
> As for the Hadoop consumer, since there is a FileSystem implementation for S3 in Hadoop,
it should be possible. The Hadoop consumer works by writing out data files containing the
Kafka messages along side offset files which contain the last offset read for each partition.
If it is re-consuming from zero each time you run it, it means it's not finding the offset
files from the previous run.
> Having used it a bit, the Hadoop consumer is certainly an area that could use improvement.
> HTH,
> David
> On 12/27/12 4:41 AM, Pratyush Chandra wrote:
>> Hi,
>> I am looking for a S3 based consumer, which can write all the received
>> events to S3 bucket (say every minute). Something similar to Flume HDFSSink
>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>> I have tried evaluating hadoop-consumer in contrib folder. But it seems to
>> be more for offline processing, which will fetch everything from offset 0
>> at once and replace it in S3 bucket.
>> Any help would be appreciated ?

  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message