storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Weathers <eweath...@groupon.com>
Subject Re: HDFS Bolts -- partitioning output
Date Wed, 06 Jan 2016 23:48:49 GMT
hey Aaron,

We've also written a similar bolt at Groupon, we aren't super satisfied
with the implementation though. :)  We are begrudgingly using it because
there is no partitioning support in the OSS storm-hdfs bolt.

Though one thing I do like about our implementation is having the ability
to define your own "Partitioner" in each topology to do various types of
partitioning (date-based, message ID-based, topic-based, whatever).  It
would be great if your implementation had such logic too.  e.g., when
deciding the HDFS path for a tuple's data, the Partitioner is called to
determine the HDFS path.  For example, it can take the Tuple object and an
opaque key/value Configuration hash that can pass items like a kafka topic
name to be included into the HDFS path.

- Erik

On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <Aaron.Dossett@target.com>
wrote:

> Hi,
>
> My team was exploring changes to the HDFS bolts that would allow for
> partitioning the output, for example into directories corresponding to
> day.  This is different that the existing functionality to rotate files
> based on a set length of time.  For unrelated reasons, we are probably not
> going to pursue this further.  However, I have some code changes that
> implement most of this functionality for at least some partitioning use
> cases.  If there is interest from the user or developer community for this
> feature, I could get in shape for a PR to get feedback about our
> implementation approach.
>
> Any feedback on this idea is welcome.  Thanks! -Aaron
>

Mime
View raw message