nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Trigger a processor if all files in a folder are processed
Date Fri, 04 Dec 2015 15:34:16 GMT
Manish,

As you have laid it out to work i think it would be harder than it
should be.  However, that is in part due to not taking advantage of
what NiFi's strengths are in building something reliable for that.
The JS being called - it may be possible to simply implement its logic
to pull data directly into NiFi rather than as files that get pulled
into a dir and then have NiFi grab them.  It would be rare that we
could count on pulling data from a directory as a one-time event and
thus once data is gone from it we know we're 'done'.  So it is
certainly better to avoid that and pull data into NiFi directly.

Once data is in NiFi:
- Validation against the schema is straightforward using 'ValidateXML'
- Delivery to HDFS is easy.
- Kicking off some process once data is delivered to HDFS is easy.
However, the part we don't have a good answer for (today) is how to
kick off that job only once all items of a correlated group of items
are passed the HDFS delivery point is something we don't have a good
answer for.  It is definitely solvable it is something that we should
tackle.  I totally agree that is a great use case.

That said, what do you think about converting the XML to Avro directly
in NiFi itself?  We don't have a processor out of the box for it but
you clearly already have the code for it so putting that into a
processor should be quite straight forward.

Thanks
Joe


On Fri, Dec 4, 2015 at 10:25 AM, Manish Gupta 8 <mgupta50@sapient.com> wrote:
> Can someone please provide a workaround for this scenario.
>
>
>
> Thanks,
>
> Manish
>
>
>
>
>
> From: Manish Gupta 8 [mailto:mgupta50@sapient.com]
> Sent: Thursday, December 03, 2015 2:18 PM
> To: users@nifi.apache.org
> Subject: Trigger a processor if all files in a folder are processed
>
>
>
> Hi,
>
>
>
> I have a scenario where I want to trigger / execute one processor once
> GetFile has pulled all the files from a folder and the last processor has
> finished its execution. How can I implement this in Nifi?
>
>
>
> Basically what I am trying to do is:
>
> ({Execute Process to call some phantomJS script to download few files in a
> directory}) : runs every 1 hour
>
> ({Get File (xml)} à {Validate with XSD} à {Put HDFS}): checks for files
> continuously
>
>
>
> Now after this flow is complete i.e. all files are available in HDFS, I want
> to submit my XML to Avro conversion MR job using Oozie REST. How can I make
> sure that my Invoke HTTP processor executes only once and that too after all
> files have successfully landed in HDFS?
>
>
>
> Thanks,
>
> Manish
>
>

Mime
View raw message