nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Efficiently caching API results in a NiFi controller service
Date Tue, 01 May 2018 13:07:54 GMT
Tim,

The reason the DMC works the way it does is because the cached data
needs to be shared across a cluster. For example, a processor like
DetectDuplicate needs to detect duplicates across all NiFi nodes and
not just the local node, or the same thing with Wait/Notify.

In your case I don't think you have the need to share data across
nodes, so each NiFi node can have an instance of your controller
service which could have a HashMap as you described.

You could definitely clear the map on enabled/disabled, and you could
also implement strategies based on time like if a cached value is
older than a certain threshold then remove it and re-fetch. It is
really up to how you use the services.

I don't see any issues with memory as long as your cache doesn't grow
indefinitely.

-Bryan


On Tue, May 1, 2018 at 6:47 AM, Otto Fowler <ottobackwards@gmail.com> wrote:
> https://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html ?
>
>
> On May 1, 2018 at 00:01:58, Tim Dean (tim.dean@gmail.com) wrote:
>
> Hello,
>
> I have a custom NiFi controller service that retrieves data from an external
> web service via HTTP requests. The results from these HTTP requests will be
> needed at various points throughout my process flow. In some situations, I
> could end up needing to access the HTTP response dozens or even hundreds of
> times.
>
> Given that the results of the HTTP request rarely change, I’d like them to
> be cached by my service and returned to my processors when needed. I’d need
> some way to explicitly clear the cache for those occasions when the data in
> the service does change.
>
> I’ve looked at using the DistributedMapCacheClientService implementation to
> cache my web service’s results, but it seems like that connects to a server
> via a socket connection and that doesn’t seem like it would be all that much
> more efficient than calling the web service directly. I’ve also looked at
> using the service’s state manager to store the results as state, but my data
> is a little more complex than what the documentation for state suggests is
> optimal: I don’t think my total map size will get to 1MB in size but it
> could be possible.
>
> Am I overthinking this? Would a simpler solution like creating a simple Java
> HashMap inside my controller service be adequate? I could empty the contents
> of the hash map whenever the controller services is enabled/disabled. Would
> the memory used by this kind of simplified local caching cause problems
> somewhere down the line?
>
> Are there other caching strategies I should be considering?
>
> Thanks
>
> -Tim
>
>

Mime
View raw message