nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Efficiently caching API results in a NiFi controller service
Date Tue, 01 May 2018 14:14:29 GMT
I would say that yes you should plan for concurrent access...

Since you have multiple processors that will be utilizing the same
service that means multiple threads, and even if you have one
processors calling it then if you increased the concurrent tasks of
that processor then multiple threads will be executing the processor
which means multiple threads calling the service.

If you use regular JDK classes then ConcurrentHashMap would be a good
starting point. If you need more locking then a simple ReadWriteLock
guarding the cache appropriately based on read or write operation.

On Tue, May 1, 2018 at 10:12 AM, Otto Fowler <ottobackwards@gmail.com> wrote:
> We used guava in Apache Metron, but have switched to
> https://github.com/ben-manes/caffeine.
> I would recommend taking a look at that too.
>
>
>
> On May 1, 2018 at 10:09:00, Charlie Meyer
> (charlie.meyer@civitaslearning.com) wrote:
>
> We do something very similar in a custom controller service and utilize a
> guava cache (https://github.com/google/guava/wiki/CachesExplained) and have
> found it to work quite well
>
> On Tue, May 1, 2018 at 9:06 AM, Tim Dean <tim.dean@gmail.com> wrote:
>>
>> Thanks Otto -
>>
>> Unfortunately, the service being called doesn’t currently support full
>> HTTP cache semantics at this time. I could add full support, and it is
>> probably the right thing to do in the long run. But for now I was hoping for
>> a solution that didn’t require significant enhancement to the web service.
>>
>> -Tim
>>
>>
>> On May 1, 2018, at 5:47 AM, Otto Fowler <ottobackwards@gmail.com> wrote:
>>
>> https://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html
>> ?
>>
>>
>> On May 1, 2018 at 00:01:58, Tim Dean (tim.dean@gmail.com) wrote:
>>
>> Hello,
>>
>> I have a custom NiFi controller service that retrieves data from an
>> external web service via HTTP requests. The results from these HTTP requests
>> will be needed at various points throughout my process flow. In some
>> situations, I could end up needing to access the HTTP response dozens or
>> even hundreds of times.
>>
>> Given that the results of the HTTP request rarely change, I’d like them to
>> be cached by my service and returned to my processors when needed. I’d need
>> some way to explicitly clear the cache for those occasions when the data in
>> the service does change.
>>
>> I’ve looked at using the DistributedMapCacheClientService implementation
>> to cache my web service’s results, but it seems like that connects to a
>> server via a socket connection and that doesn’t seem like it would be all
>> that much more efficient than calling the web service directly. I’ve also
>> looked at using the service’s state manager to store the results as state,
>> but my data is a little more complex than what the documentation for state
>> suggests is optimal: I don’t think my total map size will get to 1MB in size
>> but it could be possible.
>>
>> Am I overthinking this? Would a simpler solution like creating a simple
>> Java HashMap inside my controller service be adequate? I could empty the
>> contents of the hash map whenever the controller services is
>> enabled/disabled. Would the memory used by this kind of simplified local
>> caching cause problems somewhere down the line?
>>
>> Are there other caching strategies I should be considering?
>>
>> Thanks
>>
>> -Tim
>>
>>
>

Mime
View raw message