nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Percivall <joeperciv...@yahoo.com>
Subject Re: PutDistributedMapCache
Date Tue, 12 Jan 2016 15:45:57 GMT
Hello Sudeep,
We are currently lacking a "GetDistributedMapCache" processor that corresponds to the "PutDistributedMapCache".
I created a ticket[1] and will be working on it today. If you have any comments, configuration
suggestions, etc. please let me know or comment on the ticket.
[1] https://issues.apache.org/jira/browse/NIFI-1382 Joe- - - - - - Joseph Percivalllinkedin.com/in/Percivalle:
joepercivall@yahoo.com
 

    On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <sudeepshekharm@gmail.com> wrote:
 

 Thanks Matt.
In my data flow I am expected to perform certain validations on data. I am loading some SQLServer
data into HDFSusing Sqoop (not part of NiFi flow). For each record in HDFS file I have to
query another database and then save the validated record again in HDFS which will be processed
bysome Spark jobs.
Since I have to query for each record thus I was planning to cache the database records against
which I have to validate the HDFS. Thus I was evaluating the DistributedCacheServer. But looks
like its purpose is different. Alternatively can we integrate Redis or another distributed
cache with NiFi as I do not see any processor for it.
Appreciate your help.
Thanks & Regards,
Sudeep

On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <matt.clarke.138@gmail.com> wrote:

Sudeep,       I was a little off on my second scenario.  The detectduplicate processor
uses the distributedcache service all on its own.. Files that are route through it are loaded
into the cache if they do not already exist in the cache.  if they do already exist they
are routed to duplicate.  The putDistributedCache processor was a community contribution
to which there are no processor that make use of the info that it caches.

       We should probably build a processor that would make use of the data that can be
loaded by the putDistributeCache processor.  Is there a particular use case you are trying
to solve where this would be applicable?
Thanks,Matt
On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <matt.clarke.138@gmail.com> wrote:

Sudeep,    The DistributedMapCache is typically used to prevent the consumption of duplicate
data by some of the ingest type processors (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses
the service to keep a listing of what has been consumed so the same files are not consumed
multiple times. The Service can also be used to detect if duplicate data already exists within
a NiFi Instance or cluster. This would be the scenario where some source is pushing data to
your NiFi and perhaps they push the same data more than once. You want to catch these duplicates
so you can perhaps kick them out of your flow. For this you would use the PutDistributedCache
processor to cache all incoming data and then use the DetectDuplicate processor to find those
duplicates.

    Was there a different use case you were looking to solve using the Distributed cache
service?
Thanks,Matt
On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <sudeepshekharm@gmail.com> wrote:

Hi,
I can cache some data to be used in NiFi flow. I can see the processor PutDistributedMapCache
in the documentation which saves key-value pairs in DistributedMapCache for NiFi but I do
not see any processor to red this data. How can I read data from DistributedMapCache in
my data flow?


Thanks & Regards,
Sudeep Shekhar Mishra








-- 
Thanks & Regards,
Sudeep Shekhar Mishra
+91-9167519029sudeepshekharm@gmail.com

  
Mime
View raw message