nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Herrera <chris.herrer...@gmail.com>
Subject Re: Efficiently caching API results in a NiFi controller service
Date Tue, 01 May 2018 14:14:11 GMT
I second caffeine as well, I have used it very effectively in controller services.

> On May 1, 2018, at 9:12 AM, Otto Fowler <ottobackwards@gmail.com> wrote:
> 
> We used guava in Apache Metron, but have switched to https://github.com/ben-manes/caffeine
<https://github.com/ben-manes/caffeine>.
> I would recommend taking a look at that too.
> 
> 
> 
> On May 1, 2018 at 10:09:00, Charlie Meyer (charlie.meyer@civitaslearning.com <mailto:charlie.meyer@civitaslearning.com>)
wrote:
> 
>> We do something very similar in a custom controller service and utilize a guava cache
(https://github.com/google/guava/wiki/CachesExplained <https://github.com/google/guava/wiki/CachesExplained>)
and have found it to work quite well
>> 
>> On Tue, May 1, 2018 at 9:06 AM, Tim Dean <tim.dean@gmail.com <mailto:tim.dean@gmail.com>>
wrote:
>> Thanks Otto -
>> 
>> Unfortunately, the service being called doesn’t currently support full HTTP cache
semantics at this time. I could add full support, and it is probably the right thing to do
in the long run. But for now I was hoping for a solution that didn’t require significant
enhancement to the web service.
>> 
>> -Tim
>> 
>> 
>>> On May 1, 2018, at 5:47 AM, Otto Fowler <ottobackwards@gmail.com <mailto:ottobackwards@gmail.com>>
wrote:
>>> 
>>> https://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html <https://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html>
?
>>> 
>>> 
>>> On May 1, 2018 at 00:01:58, Tim Dean (tim.dean@gmail.com <mailto:tim.dean@gmail.com>)
wrote:
>>> 
>>>> Hello, 
>>>> 
>>>> I have a custom NiFi controller service that retrieves data from an external
web service via HTTP requests. The results from these HTTP requests will be needed at various
points throughout my process flow. In some situations, I could end up needing to access the
HTTP response dozens or even hundreds of times.  
>>>> 
>>>> Given that the results of the HTTP request rarely change, I’d like them
to be cached by my service and returned to my processors when needed. I’d need some way
to explicitly clear the cache for those occasions when the data in the service does change.

>>>> 
>>>> I’ve looked at using the DistributedMapCacheClientService implementation
to cache my web service’s results, but it seems like that connects to a server via a socket
connection and that doesn’t seem like it would be all that much more efficient than calling
the web service directly. I’ve also looked at using the service’s state manager to store
the results as state, but my data is a little more complex than what the documentation for
state suggests is optimal: I don’t think my total map size will get to 1MB in size but it
could be possible. 
>>>> 
>>>> Am I overthinking this? Would a simpler solution like creating a simple Java
HashMap inside my controller service be adequate? I could empty the contents of the hash map
whenever the controller services is enabled/disabled. Would the memory used by this kind of
simplified local caching cause problems somewhere down the line? 
>>>> 
>>>> Are there other caching strategies I should be considering? 
>>>> 
>>>> Thanks 
>>>> 
>>>> -Tim


Mime
View raw message