nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Executing a python script with Execute Stream Command
Date Fri, 05 Jun 2015 12:32:50 GMT
This is in the category of 'what sorts of things can and should we do
to provide type safety of flow configuration all the way to the user
at runtime'.

At the NYC meetup someone asked a similar question which was 'will the
flow let me configure processors to talk to eachother that don't make
sense'.  The cases you outline above seem quite doable with an
annotation.  We could extend this concept to include consideration of
attributes on flowfiles that a given processor requires.  I think it
would be limited in terms of being able to prevent the user from
making a flow that doesn't make sense because we don't know the
attributes of a flowfile until they're flowing.  However, we could
create automatic dead-letter queue type behavior if a flowfile ends up
on a connection for which the consuming processes do not accept.
@RequiresAttribute('mime.type','application/json') for instance is
something a processor could indicate and if it receives a flowfile
that doesn't have that attribute name and value the framework would
ensure it didn't get picked up by that process.

...there is a lot we can do here.

On Fri, Jun 5, 2015 at 8:22 AM, Mark Payne <markap14@hotmail.com> wrote:
> Steve,
>
> That's great that you guys have gotten this resolved.
>
> Aldrin,
>
> Great call & great work getting that stuff settled.
>
> All,
>
> I think this is a very important usability problem - I'm sure there are plenty of other
people who will run
> into similar issues. I think we need to add something to the API that allows the developer
of a Processor
> to indicate that the Processor fits into 1 of 3 categories:
>
> A) Does not expect incoming FlowFiles (UI should not allow you to even create a connection
to the Processor;
> if one exists already, the processor should become invalid)
>
> B) Processor does expect incoming FlowFiles (Processor should become invalid until it
has an incoming connection,
> just like it does if its outgoing connections are not all satisfied)
>
> C) Processor can take incoming FlowFiles but doesn't require them. I don't know that
we have this
> use case in any of our Processors, but it is a valid use case, I think. In this case,
the API needs to provide
> information to the Processor (via the ProcessContext) about whether or not it has any
incoming connections.
> I believe I may have already created a ticket for this, but not sure.
>
> Does anybody have any thoughts on this?
>
>
> ----------------------------------------
>> Date: Fri, 5 Jun 2015 08:04:55 -0400
>> Subject: Re: Executing a python script with Execute Stream Command
>> From: stephen.pietrasko@g2-inc.com
>> To: dev@nifi.incubator.apache.org
>>
>> Aldrin,
>>
>> I want to thank you for your help. ExecuteProcess was the solution to my
>> problem.
>>
>> Thanks for everyone that helped.
>>
>> -Steve
>>
>> On Thu, Jun 4, 2015 at 2:06 PM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
>>
>>> Steve,
>>>
>>> I was able to mock up a flow myself and can provide a template to share
>>> with you that acts as I would anticipate. All of this is coming with the
>>> heavy caveat that I am not a Python master by any means.
>>>
>>> Before that, however, reading through the history, can you clarify if you
>>> are providing any input to the processor? Based on the context and noted
>>> behavior of the tasks/time increasing, my suspicion is that you are not,
>>> and the intent of the processor is not aligning with your expectations of
>>> this processor acting as a means of ingest into the flow. To that end, the
>>> intent of the ExecuteStreamProcessor as designed is to "... execute[s] an
>>> external command on the contents of a flow file, and create[s] a new flow
>>> file with the results of the command." Accordingly, if there is no input,
>>> the processor just returns after being allotted an execution cycle.
>>>
>>> I believe you may be after the ExecuteProcess processor which could be
>>> adapted to carry out execution without the need for input.
>>>
>>> Let us know if that is the case, if not, any additional clues will help us
>>> get to the issue.
>>>
>>> Thanks!
>>>
>>> --aldrin
>>>
>>> On Thu, Jun 4, 2015 at 12:51 PM, Stephen Pietrasko <
>>> stephen.pietrasko@g2-inc.com> wrote:
>>>
>>>> Mark,
>>>>
>>>> Unfortunately that did not work. The Tasks/Time keep increasing but
>>> nothing
>>>> else.
>>>>
>>>> Thanks,
>>>> Steve
>>>>
>>>> On Thu, Jun 4, 2015 at 12:37 PM, Mark Payne <markap14@hotmail.com>
>>> wrote:
>>>>
>>>>> Stephen,
>>>>>
>>>>> The "Command Argument" property expects the arguments to be delimited
>>> by
>>>>> semi-colons, rather than spaces.
>>>>>
>>>>> Try changing that property to "nameofscript.py;-j;multiline" and see
if
>>>>> that works for you.
>>>>>
>>>>> Thanks
>>>>> -Mark
>>>>>
>>>>> ----------------------------------------
>>>>>> Date: Thu, 4 Jun 2015 12:34:26 -0400
>>>>>> Subject: Re: Executing a python script with Execute Stream Command
>>>>>> From: stephen.pietrasko@g2-inc.com
>>>>>> To: dev@nifi.incubator.apache.org
>>>>>>
>>>>>> Mark,
>>>>>>
>>>>>> The properties I am using are as follows:
>>>>>>
>>>>>> Command Argument: nameofscript.py -j multine
>>>>>> Command Path: python
>>>>>> Working Directory /opt/dev/
>>>>>>
>>>>>>
>>>>>> Jimmy,
>>>>>>
>>>>>> Not exactly sure what you are asking with your question "Does the
>>>> python
>>>>>> script that you run from NiFi have a select set of Python packages
>>> you
>>>>> can
>>>>>> leverage in your python script. Is it at all possible to add
>>> additional
>>>>>> python packages?"
>>>>>>
>>>>>> Here is a sanitized version of the script. Are you asking if I can
>>>> import
>>>>>> more packages in my script? If so, yes, I can do that.
>>>>>>
>>>>>> http://pastebin.com/peSCkx6j
>>>>>>
>>>>>>
>>>>>> Thank you guys.
>>>>>>
>>>>>> -Steve
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 4, 2015 at 9:57 AM, Mark Payne <markap14@hotmail.com>
>>>> wrote:
>>>>>>
>>>>>>> Stephen,
>>>>>>>
>>>>>>> Your understanding of the properties seem correct. Can you provide
>>> the
>>>>>>> properties that you're using to configure the processor?
>>>>>>>
>>>>>>> Thanks
>>>>>>> -mark
>>>>>>>
>>>>>>> ----------------------------------------
>>>>>>>> Date: Thu, 4 Jun 2015 09:51:46 -0400
>>>>>>>> Subject: Executing a python script with Execute Stream Command
>>>>>>>> From: stephen.pietrasko@g2-inc.com
>>>>>>>> To: dev@nifi.incubator.apache.org; rob.weiss@g2-inc.com
>>>>>>>>
>>>>>>>> All,
>>>>>>>>
>>>>>>>> I am trying to configure the Execute Stream Command processor
to
>>>>> execute
>>>>>>> a
>>>>>>>> python script and have the output send to a queue with PutJMS.
>>>>>>>>
>>>>>>>> I'm having a bit of difficulty though. I've been looking
at this
>>>>> previous
>>>>>>>> email chain which is similar to my issue.
>>>>>>>>
>>>>>
>>> https://www.mail-archive.com/dev@nifi.incubator.apache.org/msg01578.html
>>>>>>>>
>>>>>>>> The script runs and sends the output to sys.stdout.write
but when I
>>>> try
>>>>>>> and
>>>>>>>> have NiFi run the script I see no bytes in or out which means
>>> nothing
>>>>> is
>>>>>>>> passed to the queue.
>>>>>>>>
>>>>>>>> Would this be an issue with the output being sent to stdout
or a
>>>>> property
>>>>>>>> issue with ExecuteStreamCommand.
>>>>>>>>
>>>>>>>> I have tried several configurations of the property fields.
This is
>>>> my
>>>>>>>> general understanding of each field and what they should
be:
>>>>>>>>
>>>>>>>> Command Argument: name of script and arguments
>>>>>>>> Command Path: python
>>>>>>>> Working Directory: Directory where script is located.
>>>>>>>>
>>>>>>>> Any help would be greatly appreciated.
>>>>>>>>
>>>>>>>> --
>>>>>>>> V/R
>>>>>>>>
>>>>>>>> Stephen M. Pietrasko
>>>>>>>> Security Engineer
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> V/R
>>
>> Stephen M. Pietrasko
>> Security Engineer
>> G2-Inc
>> 301-575-5142
>> www.g2-inc.com
>

Mime
View raw message