chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Popa Nicolae <>
Subject Re: ZIP format/ S3 storage system support
Date Wed, 10 Jan 2018 13:18:17 GMT
Thank you for your response.

My target is to create a tool capable of grepping through large data sets
of logs (e.g: the size of these sets range from 1TB onwards) and offer
answers to queries in reasonable amount of time (e.g: from seconds to
several minutes, at most 1 hour). The logs are placed in S3 (e.g: the logs
are produced by EMR jobs) in a compressed format (e.g: gzip or LZO). I will
expect some performance tuning to be done in the end in order accomplish my
performance targets.

I don't know your current roadmap, but I will like to contribute to Chukwa
by providing support for reading/storing compressed logs for different
formats (e.g: gzip, bzip2, LZO, Snappy, etc.). Moreover, I will test Chukwa
with S3 as input source and see if it works and contribute here too if
necessary. Are you interested in these kind of contributions ? Does your
roadmap include any performance tuning tasks?


On 9 January 2018 at 18:33, Popa Nicolae <> wrote:

> Hello guys,
> I am new to Apache Chukwa and I was exploring the possibility to use it
> for one of my use cases. While I was reading the documentation I didn't
> find any mention about zip format support or S3 storage system.
> 1. Does Chukwa support reading and storing ZIP archives?
> 2. Besides HDFS file system, does Chukwa support reading/writing to Amazon
> S3 storage?
> Thank you,
> Flavian

View raw message