nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Hughes <>
Subject Re: DistributedMapCache w/ ListSFTP and FetchSFTP
Date Thu, 15 Dec 2016 21:39:31 GMT

Thank you for the quick response. So, the DMC is just so you won't
duplicate fetches if you're listing faster than you're fetching... got it.
The usage documentation is kinda vague about that, so I made it out to be
more magical than it is. Thanks for pointing me in the right direction!


On Thu, Dec 15, 2016 at 4:21 PM, Pierre Villard <
> wrote:

> Hi Nicholas,
> You need to configure your ListSFTP processor to only run on the primary
> node (scheduling strategy in processor configuration), then to send the
> flow files to a RPG that points to an input port in the cluster itself (so
> that flow files are distributed over the cluster and do not stay only on
> the primary node), then the FetchSFTP processor will take care of
> downloading the files. The ListSFTP, with its state (DistributedCache),
> ensures that you don't download the same file twice, and a given file won't
> be downloaded by two nodes at the same time.
> Hope this helps,
> Pierre.
> 2016-12-15 22:13 GMT+01:00 Nicholas Hughes <
> >:
>> I'm testing a simple List/Fetch setup on a 3 node cluster. I created a
>> DistributedMapCacheServer controller service with the default settings (no
>> SSL) and then created a DistributedMapCacheClientService that points at
>> one of the cluster hostnames. The ListSFTP processor is set to use the
>> Distributed Cache Service that I created.
>> The ListSFTP processor lists the same 100 source files from the remote
>> system on each node, and sends 300 Flow Files downstream to the FetchSFTP
>> processor. I thought that the map cache allowed the cluster nodes to
>> determine which files had already been listed by other cluster nodes...
>> maybe I'm missing something.
>> Any assistance is appreciated.
>> NiFi version 1.0.0 in HDF 2.0.1
>> -Nick

View raw message