nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Wing <jvw...@gmail.com>
Subject Re: Nifi partition data by date
Date Thu, 03 Nov 2016 01:01:58 GMT
This is absolutely possible.  A sample sequence of processors might include:

1. UpdateAttribute - to extract a record date from the flowfile content
into an attribute, 'recordgroup' for example
2. MergeContent - to group related records together, setting the
Correlation Attribute Name property to use 'recordgroup'
3. UpdateAttribute - (optional) to apply the 'recordgroup' attribute to the
'path' and/or 'filename' attributes, depending on how you do #4.  May be
useful to get customized filenames with extensions.
4. Put* - to write the grouped file to storage (PutFile, PutHDFS,
PutS3Object, etc.).  With PutHDFS for example, use Expression Language in
the Directory property to apply your grouping - like
'/tmp/hive/records/${recordgroup}' to get '/tmp/hive/records/2016-01-01'.

In concept, it's that simple.  The #2 MergeContent step can be more
complicated as you consider how many files should be output from the
stream, how big they should be, how frequently, and how many bins are
likely to be open collecting files at any one time.  You might also
consider compressing the files.

Thanks,

James

On Wed, Nov 2, 2016 at 5:34 PM, Santiago Ciciliani <
santiago.ciciliani@gmail.com> wrote:

> I'm trying to split a stream of data into multiple different files based
> on the content date.
>
> So imagine that you are receiving streams of logs and you want to save as
> a Hive partitioned table so for example all records with date 2016-01-01
> into directory dt=2016-01-01.
>
> Is this even possible?
>
> Thanks
>
>

Mime
View raw message