nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy LoPresto <alopre...@apache.org>
Subject Re: Wildcard character in the Command Argument field of the ExecuteStreamCommand processor
Date Wed, 01 Jun 2016 20:07:10 GMT
Huagen,

Here is an example [1] which does what you are asking. This is a quick hack, and a better
option is probably to use InvokeScriptedProcessor [2], which is explained well by Matt Burgess
on his blog [3][4]. However, with this method, you do not need to modify the internal code
of NiFi at all. You can simply drop the contents of testInvokeListFileProcessor.groovy into
the ExecuteScript processor (or reference the external file), configure the properties from
the test, and run.

Quick overview:

The test (independently) lists the files in a directory for comparison later, sets up an ExecuteScript
processor, configures the necessary properties, sends an incoming flowfile with the directory
path as content and file filter as an attribute, and executes the processor.

The script consumes the incoming flowfile, extracts the directory path from the content, extracts
the file filter from the attribute, sets up a few more hard-coded values (like min/max size
and age), and then invokes ListFile and returns the massaged output as a new flowfile.

Again, this is a bit hacky, but it accomplishes what you are asking for. As I said above,
for a production system I would recommend that you write a custom processor using InvokeScriptedProcessor
which does something similar (and doesn’t rely on mocking so much of the framework to interact
with ListFile).

[1] https://github.com/apache/nifi/compare/master...alopresto:groovyListFileDemo?expand=1
<https://github.com/apache/nifi/compare/master...alopresto:groovyListFileDemo?expand=1>
[2] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.script.InvokeScriptedProcessor/index.html
[3] http://funnifi.blogspot.com/2016/02/invokescriptedprocessor-hello-world.html <http://funnifi.blogspot.com/2016/02/invokescriptedprocessor-hello-world.html>
[4] http://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html <http://funnifi.blogspot.com/2016/02/writing-reusable-scripted-processors-in.html>


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 31, 2016, at 4:11 PM, Huagen peng <huagen.peng@gmail.com> wrote:
> 
> Andy,
> 
> Could you please explain how to invoke a ListFile processor from the ExecuteScript processor?
 Is it an API call?
> 
> Huagen
> 
>> 在 2016年5月31日,下午3:23,Andy LoPresto <alopresto@apache.org <mailto:alopresto@apache.org>>
写道:
>> 
>> Huagen,
>> 
>> I understand your issue. You can report a Jira [1] to request those processors be
able to accept input, but I don’t believe that change is likely. One solution would be to
extend the ListFile processor [2] as it is not a final class, and create your own “DynamicListFile”
processor which accepts an incoming flowfile and populates the monitored directory from the
flowfile contents. You may encounter issues with this approach if the directory changes, as
the internal state maintenance of ListFile may behave unusually.
>> 
>> Another solution would be to use the ExecuteScript [3] processor with a small Groovy
script which would accept an incoming flowfile, parse the contents to determine the desired
directory, and then configure and invoke the ListFile processor directly, currying the output
to a new flowfile(s).
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI/ <https://issues.apache.org/jira/browse/NIFI/>
>> [2] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ListFile.java
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/ListFile.java>
>> [3] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/main/java/org/apache/nifi/processors/script/ExecuteScript.java
<https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/main/java/org/apache/nifi/processors/script/ExecuteScript.java>
>> 
>> 
>> 
>> Andy LoPresto
>> alopresto@apache.org <mailto:alopresto@apache.org>
>> alopresto.apache@gmail.com <mailto:alopresto.apache@gmail.com>
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>>> On May 31, 2016, at 12:08 PM, Huagen peng <huagen.peng@gmail.com <mailto:huagen.peng@gmail.com>>
wrote:
>>> 
>>> Thank you for your suggestion, Andy and Lee.
>>> 
>>> I am aware of the flow using ListFile-FetchFile-HashContent. I didn’t go for
that route because the ListFile processor does not allow upstream processor. I have an upstream
processor, from which I know the directory I want to work with.  I end up to passing the directory
name into the ExecuteStreamCommand processor to get ALL the files under the directory. After
that I use SplitText and ExtractText to filter the files with the desired file extension,
and then I use FetchFile and HashContent to finish what I want to do.
>>> 
>>> If ListFile allows upstream input, it would have make my data flow much easier.
 The same goes for the ListSFTP processor.
>>> 
>>> Huagen
>>> 
>>>> 在 2016年5月31日,下午2:56,Lee Laim <lee.laim@gmail.com <mailto:lee.laim@gmail.com>>
写道:
>>>> 
>>>> Huagen,
>>>> 
>>>> I had a similar workflow and eventually replaced ExecuteStreamCommand(md5sum)
with HashContent.
>>>> 
>>>> Using  ListFile->FetchFile->HashContent, the resultant hash is placed
into the flowfile under the attribute ${hash.value}.
>>>> This processor offers ~40 algorithms to choose from, including md5.   Compared
to the ExecuteStreamCommand, the HashContent processor offers a bit more in error-handling
and lineage traceability in this specific case.
>>>> 
>>>> Thanks,
>>>> -Lee
>>>> 
>>>> 
>>>> On Tue, May 31, 2016 at 11:24 AM, Andy LoPresto <alopresto@apache.org
<mailto:alopresto@apache.org>> wrote:
>>>> Huagen,
>>>> 
>>>> The ExecuteStreamCommand is used to run a command against the contents of
an incoming flowfile. For example, you could have a ListFile processor listing all .gz files
in the directory and passing them to the ExecuteStreamCommand processor to generate the MD5
hash of each. In this case, you would not need a wildcard character in the command.
>>>> 
>>>> The configuration for the processors is as follows:
>>>> 
>>>> ListFile:
>>>> 	-Input directory: <the directory where the files are located>
>>>> 	-File Filter: [^\.]\.gz
>>>> 
>>>> ExecuteStreamCommand:
>>>> 	-Command arguments: ${filename}
>>>> 	-Command path: md5
>>>> 	-Working Directory: <the directory where the files are located>
>>>> 	-Output Destination Attribute: md5hash
>>>> 
>>>> Notes:
>>>> 	-I am using “md5” rather than “md5sum” as I am on Mac OS X.
>>>> 	-You could use the “-n” flag for “md5” to suppress extraneous output
>>>> 	-You could use “${absolute.path}/${filename}” as the command arguments,
in which case you would not need to set the working directory
>>>> 
>>>> Andy LoPresto
>>>> alopresto@apache.org <mailto:alopresto@apache.org>
>>>> alopresto.apache@gmail.com <mailto:alopresto.apache@gmail.com>
>>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>> 
>>>>> On May 31, 2016, at 7:02 AM, Huagen peng <huagen.peng@gmail.com <mailto:huagen.peng@gmail.com>>
wrote:
>>>>> 
>>>>> Hi, I would like to run a md5sum command on all the *.gz files under
a certain directory.  However, I keep getting this error:
>>>>> md5sum: stat '/tmp/transfer/16-05-22_00/*.gz': No such file or directory
>>>>> 
>>>>> I tried quoting the * wild character, adding a . dot or / in front with
no avail.  Can I do something like this with the ExecuteStreamCommand processor?
>>>>> 
>>>>> Thanks.
>>>> 
>>>> 
>>> 
>> 
> 


Mime
View raw message