nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Database Lookup Approach
Date Sat, 14 Nov 2015 18:24:19 GMT
Indus,

Let's project out one order of magnitude.  That puts you at just a bit
more than 500 lookups per second.  You will want to consider the
properties of the data itself and the properties of the database.
This helps you decide on the appropriate caching and querying
strategy.  If the data as-is lends itself to highly effective caching
behavior then you're good.  If not you may choose to intentionally
merge/batch data together to gain higher cache hits.  With regard to
the database itself if it is being updated frequently then you'll need
a tighter cache refresh policy and so on.  Anyway such cases and
fairly wide combination of scenarios have been played out with great
effect in NiFi.  We have some of the tools you'll need out of the box
to do this and some you'll possibly need to build for your database or
caching strategy or if you want to edit the JSON data based on the
results of the database lookup.

So general flow logic would be like this (not specific to any existing
processor)

1) Consume JSON object
2) Extract username from JSON object as a flow file attribute
3) Execute SQL statement to lookup results of query with username.
This could be a single processor which has a built-in caching
mechanism or a processor that first looks up in a cache then if misses
looks up against the actual database and then populates the cache.
4) The attributes of the flow file now contain the results of the
query.  Update the JSON object content if necessary using attributes
or just make an attribute based routing decision

The key part of this is really 3.  Many ways to go after that.

Thanks
Joe

On Sat, Nov 14, 2015 at 12:46 PM, indus well <induswell@gmail.com> wrote:
> Hi NiFi Experts:
>
> I have a use case where I consume a Json message containing an userID and
> need to lookup the value from a database table. The transaction volume is
> averaging around 5 million per day and growing. What would be the best
> approach for this scenario.
>
> Thanks,
>
> Indus
>
>

Mime
View raw message