nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Frasure <>
Subject Re: archive files
Date Mon, 23 Nov 2015 16:37:18 GMT

This is a continuous task.  The main intent is to keep a version of the
file prior to conversions etc.  Ideally, it would be highly compressed, and
easy to locate.  Best case scenario, the archive files are the contents of
highly structured nested directories.  File sizes range from a few bytes to
< 1GB.  It wouldn't have to run real time (updating archives seems to be a
fairly intensive task), but would probably run at least every few days.


On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <> wrote:

> Charlie,
> Can give some pointers on how to get in the ballpark with this but
> want to make sure we have a good alignment of purpose here.  NiFi has
> from time to time come up as an intuitive way to build an archive
> management tool and it is always "not quite right" because of the
> subtle differences between continuous streams of information and
> ad-hoc sort of one-time tasks.
> Would this be a continuous task (always running) even if it is slow
> (every few minutes, hours, days) or would it be a one-time thing to
> move a bunch of data from one place to another?
> The difference sounds very minor but it will help me to understand how
> best to respond.
> Thanks
> Joe
> On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
> <> wrote:
> > Use case: Archive and compress files by category and month, store like
> files
> > in a common directory.
> >
> > I'm already processing the files, and have extracted the interesting
> > attributes from each.  I ran them through MergeContent, but have not been
> > able to produce a logical directory structure to store the results.  I
> would
> > prefer something like archive/categoryA/201511/somefilename.tar.gz where
> > somefilename is made up of all the categoryA files received in November
> > 2015.
> >
> > I switched gears, and used PutFile to store the files in the preferred
> > directory structure, but at a loss of how to archive them within their
> > folders given hundreds of dynamic categories, and date additions every
> > month.
> >
> > I'm playing with MergeContent's Correlation Attribute Name, but am also
> > considering trying the "Degfragment" merge strategy by correlating the
> files
> > earlier in the process.
> >
> > Any suggestions would be appreciated.

View raw message