nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <>
Subject Re: Database Lookup Approach
Date Fri, 01 Jan 2016 15:37:52 GMT

My apologies for the response time.  For step 3 I was suggesting a
processor which takes a given flowfile attribute and looks that value
up against a cache to then populate further attributes or indicate a
miss.  In the event of a hit the values are populated and you're set.
In the case of a miss then route to the ExecuteSQL processor to get
the results.  After the ExecuteSQL processor then the results could be
routed a processor to populate the cache.

Joe Percivall: As part of the work you've been doing with templates
this is a really good use case to document as this is a fairly common
pattern and we leave a lot of the exercise up to the reader.  We
should make a template to do this, identify gaps where we should
provide features, and publish a good guide.


On Sun, Nov 15, 2015 at 11:40 AM, indus well <> wrote:
> Thanks, Joe. Yes, the referenced lookup data does not change frequently so
> your built-in caching strategy sounds great. Would you expand on how to
> implement the built-in caching mechanism as well as refreshing the cache in
> NiFi please?
> Thanks,
> Indus
> On Sat, Nov 14, 2015 at 12:24 PM, Joe Witt <> wrote:
>> Indus,
>> Let's project out one order of magnitude.  That puts you at just a bit
>> more than 500 lookups per second.  You will want to consider the
>> properties of the data itself and the properties of the database.
>> This helps you decide on the appropriate caching and querying
>> strategy.  If the data as-is lends itself to highly effective caching
>> behavior then you're good.  If not you may choose to intentionally
>> merge/batch data together to gain higher cache hits.  With regard to
>> the database itself if it is being updated frequently then you'll need
>> a tighter cache refresh policy and so on.  Anyway such cases and
>> fairly wide combination of scenarios have been played out with great
>> effect in NiFi.  We have some of the tools you'll need out of the box
>> to do this and some you'll possibly need to build for your database or
>> caching strategy or if you want to edit the JSON data based on the
>> results of the database lookup.
>> So general flow logic would be like this (not specific to any existing
>> processor)
>> 1) Consume JSON object
>> 2) Extract username from JSON object as a flow file attribute
>> 3) Execute SQL statement to lookup results of query with username.
>> This could be a single processor which has a built-in caching
>> mechanism or a processor that first looks up in a cache then if misses
>> looks up against the actual database and then populates the cache.
>> 4) The attributes of the flow file now contain the results of the
>> query.  Update the JSON object content if necessary using attributes
>> or just make an attribute based routing decision
>> The key part of this is really 3.  Many ways to go after that.
>> Thanks
>> Joe
>> On Sat, Nov 14, 2015 at 12:46 PM, indus well <> wrote:
>> > Hi NiFi Experts:
>> >
>> > I have a use case where I consume a Json message containing an userID
>> > and
>> > need to lookup the value from a database table. The transaction volume
>> > is
>> > averaging around 5 million per day and growing. What would be the best
>> > approach for this scenario.
>> >
>> > Thanks,
>> >
>> > Indus
>> >
>> >

View raw message