nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Bateman <r...@windofkeltia.com>
Subject Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary
Date Fri, 03 Feb 2017 00:14:54 GMT
There is also a /SplitContent/ processor. Assuming you can recognize the 
boundaries of the different data types, you can split them up into 
separate flowfiles. Then you /MergeContent/ them back together later.


On 02/02/2017 04:19 PM, James McMahon wrote:
> This is very helpful Russell, but in my case each file is a mix of 
> data types. So even if i determine that the flowfile is a mix, I'd 
> still have to be poised to tackle it it my ExecuteScript script. Good 
> suggestion, though, and one I can use in other ways in my workflows.
>
> I do hope someone can tell me what I can do in my callback write back 
> to handle all. I'd like to better understand this error I'm getting, 
> too.  -Jim
>
> On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman <russ@windofkeltia.com 
> <mailto:russ@windofkeltia.com>> wrote:
>
>     Could you use /RouteOnContent/ to determine what sort of content
>     you're dealing with, then branch to different /ExecuteScript/
>     processors rigged to different Python scripts?
>
>     Hope this comment is helpful.
>
>
>     On 02/02/2017 03:38 PM, James McMahon wrote:
>>
>>     I have a flowfile that has tagged character information I need to
>>     get at throughout the first few sections of the file. I need to
>>     use regex in python to select some of those values and to
>>     transform others. I am using an ExecuteScript processor to
>>     execute my python code. Here is my approach:
>>
>>     = = = = =
>>
>>     class PyStreamCallback(StreamCallback) :
>>
>>        def __init__ (self) :
>>
>>        def process(self, inputSteam, outputStream) :
>>
>>           stuff = IOUtils.toString(inputStream,
>>     StandardCharsets.UTF_8)  # what happens to my binary and extreme
>>     chars when they get passed through this step?
>>
>>          .
>>
>>          . (transform and pick out select content)
>>
>>          .
>>
>>     outputStream.write(bytearray(stuff.encode(‘utf-8’)))) # am I
>>     using the wrong functions to put my text chars and my binary and
>>     my extreme chars back on the stream as a byte stream? What should
>>     I be doing to handle the variety of data?
>>
>>     flowFile = session.get()
>>
>>     if (flowFile!= None)
>>
>>        incoming = flowFile.getAttribute(‘filename’)
>>
>>     logging.info <http://logging.info>(‘about to process file: %s’,
>>     incoming)
>>
>>        flowFile = session.write(flowFile, PyStreamCallback())   #
>>     line 155 in my code
>>
>>     session.transfer(flowFile, REL_SUCCESS)
>>
>>        session.commit()
>>
>>     = = = = =
>>
>>     When my incoming flowfile is all character content - such as
>>     tagged xml - my code works fine. All the flowfiles that also
>>     contain some binary data and/or characters at the extremes such
>>     as foreign language characters don’t work. They error out. I
>>     suspect it has to do with the way I am writing back to the
>>     flowfile stream.
>>
>>     Here is the error I am getting:
>>
>>     Org.apache.nifi.processor.exception.ProcessException:
>>     javax.script.ScriptException: TypeError: write(): 1^st arg can’t
>>     be coerced to int, byte[] in <script> at line number 155
>>
>>     How should I handle the write back to the flowfile in cases where
>>     I have a mix of character and binary?
>>
>>     Note: I must do this programmatically. I tried using a
>>     combination of SplitContent and MergeContent, but I have no
>>     consistent reliable demarcation between the regular text
>>     characters and the other more challenging characters that I can
>>     split on.
>>
>>     All the examples I've found handle more pure circumstances than
>>     mine seems to be. For example, all text. Or all JSON. I've not
>>     yet been able to find an example that shows me how to write back
>>     to the stream for mixed data situations. Can you help?
>
>


Mime
View raw message