nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Design question - restartable long-running processes
Date Tue, 01 Sep 2015 16:27:21 GMT
Rick,

There have been a few requests for a first-class state management feature,
and it is definitely on the community's radar.

Right now, a good example of the current approach would probably be the
ListHDFS processor. It uses a combination of a local state file and the
DistributedMapCache controller service.
In a cluster, ListHDFS would be scheduled to run only on the primary node,
so by utilizing the the DistributedMapCache it allows all nodes in a
cluster to know where to pick up in the event that the primary node of the
cluster is changed.
There are a few other processors that also use the local state file
approach, I believe GetHttp and GetSolr are two of them.

-Bryan


On Tue, Sep 1, 2015 at 11:59 AM, Rick Braddy <rbraddy@softnas.com> wrote:

> Hi,
>
> I have a Nifi design question. In order to process extremely large files
> (any size), we intend to create a processor that reads the file in "chunks"
> and sends as a multi-part FlowFile series, which will avoid using up all
> available content repository and/or JVM space.
>
> One way would be to create our own state file that contains the latest job
> information (per thread/job), but that seems very clunky.
>
> The question is, with long-running processes like this that need to be
> restartable (without starting from the beginning on big files), are there
> any standard Nifi design patterns we should consider?
>
> Thanks in advance.
> Rick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message