nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sudeep mishra <sudeepshekh...@gmail.com>
Subject Re: PutDistributedMapCache
Date Thu, 14 Jan 2016 03:57:26 GMT
Is it possible to build the code for only a particular processor? Just
curious if we can build and deploy a particular processor in an existing
NiFi environment.

On Wed, Jan 13, 2016 at 9:33 PM, sudeep mishra <sudeepshekharm@gmail.com>
wrote:

> Thanks Joe. I will try out the patch.
>
> On Wed, Jan 13, 2016 at 9:31 PM, Joe Percivall <joepercivall@yahoo.com>
> wrote:
>
>> You would need to clone the nifi source from github and then apply the
>> patch using git.
>>
>> Here is how to clone a repo:
>> https://help.github.com/articles/cloning-a-repository/
>> Along with the nifi repo itself: https://github.com/apache/nifi
>>
>> and how to apply a patch:
>> http://makandracards.com/makandra/2521-git-how-to-create-and-apply-patches
>>
>> Let me know if you have any other questions,
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: joepercivall@yahoo.com
>>
>>
>>
>> On Wednesday, January 13, 2016 10:56 AM, sudeep mishra <
>> sudeepshekharm@gmail.com> wrote:
>>
>>
>>
>> Thank you very much Joe.
>>
>> Can you please let me know how I can use the .patch file? I am using the
>> NiFi via the binaries... Do I need to setup the source code and build the
>> same along with the patch?
>>
>> Thanks & Regards,
>>
>> Sudeep
>>
>>
>> On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <joepercivall@yahoo.com>
>> wrote:
>>
>> Hello Sudeep,
>> >
>> >I put up a patch on the GetDistributedMapCache ticket[1]. Let me know
>> what you think.
>> >
>> >The PutDistributedMapCache processor and GetDistributedMapCache work
>> with the data as a byte[] so it should be format agnostic. That being said
>> it will be up to you to know what is in there in order to use it later.
>> >
>> >[1] https://issues.apache.org/jira/browse/NIFI-1382
>> >
>> >Joe
>> >- - - - - -
>> >Joseph Percivall
>> >linkedin.com/in/Percivall
>> >e: joepercivall@yahoo.com
>> >
>> >
>> >
>> >
>> >On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <
>> sudeepshekharm@gmail.com> wrote:
>> >
>> >
>> >
>> >Thanks Joe.
>> >
>> >I do not have specific configuration as of now as I am still exploring
>> NiFi. Though I think it would be helpful to let user store and retrieve the
>> cache values in different formats json, avro etc.
>> >
>> >Thanks & Regards,
>> >
>> >Sudeep
>> >
>> >
>> >
>> >
>> >
>> >On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <joepercivall@yahoo.com>
>> wrote:
>> >
>> >Hello Sudeep,
>> >>
>> >>
>> >>We are currently lacking a "GetDistributedMapCache" processor that
>> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
>> be working on it today. If you have any comments, configuration
>> suggestions, etc. please let me know or comment on the ticket.
>> >>
>> >>
>> >>[1] https://issues.apache.org/jira/browse/NIFI-1382
>> >>
>> >>Joe
>> >>- - - - - -
>> >>Joseph Percivall
>> >>linkedin.com/in/Percivall
>> >>e: joepercivall@yahoo.com
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
>> sudeepshekharm@gmail.com> wrote:
>> >>
>> >>
>> >>
>> >>Thanks Matt.
>> >>
>> >>
>> >>In my data flow I am expected to perform certain validations on data. I
>> am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi
>> flow). For each record in HDFS file I have to query another database and
>> then save the validated record again in HDFS which will be processed bysome
>> Spark jobs.
>> >>
>> >>
>> >>Since I have to query for each record thus I was planning to cache the
>> database records against which I have to validate the HDFS. Thus I was
>> evaluating the DistributedCacheServer. But looks like its purpose is
>> different. Alternatively can we integrate Redis or another distributed
>> cache with NiFi as I do not see any processor for it.
>> >>
>> >>
>> >>Appreciate your help.
>> >>
>> >>
>> >>Thanks & Regards,
>> >>
>> >>
>> >>Sudeep
>> >>
>> >>
>> >>
>> >>
>> >>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <
>> matt.clarke.138@gmail.com> wrote:
>> >>
>> >>Sudeep,
>> >>>       I was a little off on my second scenario.  The detectduplicate
>> processor uses the distributedcache service all on its own.. Files that are
>> route through it are loaded into the cache if they do not already exist in
>> the cache.  if they do already exist they are routed to duplicate.  The
>> putDistributedCache processor was a community contribution to which there
>> are no processor that make use of the info that it caches.
>> >>>
>> >>>       We should probably build a processor that would make use of the
>> data that can be loaded by the putDistributeCache processor.  Is there a
>> particular use case you are trying to solve where this would be applicable?
>> >>>
>> >>>
>> >>>Thanks,
>> >>>Matt
>> >>>
>> >>>
>> >>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <
>> matt.clarke.138@gmail.com> wrote:
>> >>>
>> >>>Sudeep,
>> >>>>    The DistributedMapCache is typically used to prevent the
>> consumption of duplicate data by some of the ingest type processors
>> (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a
>> listing of what has been consumed so the same files are not consumed
>> multiple times. The Service can also be used to detect if duplicate data
>> already exists within a NiFi Instance or cluster. This would be the
>> scenario where some source is pushing data to your NiFi and perhaps they
>> push the same data more than once. You want to catch these duplicates so
>> you can perhaps kick them out of your flow. For this you would use the
>> PutDistributedCache processor to cache all incoming data and then use the
>> DetectDuplicate processor to find those duplicates.
>> >>>>
>> >>>>    Was there a different use case you were looking to solve using
>> the Distributed cache service?
>> >>>>
>> >>>>
>> >>>>Thanks,
>> >>>>Matt
>> >>>>
>> >>>>
>> >>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <
>> sudeepshekharm@gmail.com> wrote:
>> >>>>
>> >>>>Hi,
>> >>>>>
>> >>>>>
>> >>>>>I can cache some data to be used in NiFi flow. I can see the
>> processor PutDistributedMapCache in the documentation which saves key-value
>> pairs in DistributedMapCache for NiFi but I do not see any processor to red
>> this data. How can I read data from DistributedMapCache in my data flow?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>Thanks & Regards,
>> >>>>>
>> >>>>>
>> >>>>>Sudeep Shekhar Mishra
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >>--
>> >>
>> >>Thanks & Regards,
>> >>
>> >>
>> >>Sudeep Shekhar Mishra
>> >>
>> >>
>> >>+91-9167519029
>> >>sudeepshekharm@gmail.com
>> >>
>> >>
>> >
>> >
>> >--
>> >
>> >Thanks & Regards,
>> >
>> >Sudeep Shekhar Mishra
>> >
>> >+91-9167519029
>> >sudeepshekharm@gmail.com
>> >
>>
>>
>> --
>>
>> Thanks & Regards,
>>
>> Sudeep Shekhar Mishra
>>
>> +91-9167519029
>> sudeepshekharm@gmail.com
>>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Mime
View raw message