nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James McMahon <jsmcmah...@gmail.com>
Subject Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary
Date Thu, 02 Feb 2017 23:19:58 GMT
This is very helpful Russell, but in my case each file is a mix of data
types. So even if i determine that the flowfile is a mix, I'd still have to
be poised to tackle it it my ExecuteScript script. Good suggestion, though,
and one I can use in other ways in my workflows.

I do hope someone can tell me what I can do in my callback write back to
handle all. I'd like to better understand this error I'm getting, too.
 -Jim

On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman <russ@windofkeltia.com>
wrote:

> Could you use *RouteOnContent* to determine what sort of content you're
> dealing with, then branch to different *ExecuteScript* processors rigged
> to different Python scripts?
>
> Hope this comment is helpful.
>
>
> On 02/02/2017 03:38 PM, James McMahon wrote:
>
> I have a flowfile that has tagged character information I need to get at
> throughout the first few sections of the file. I need to use regex in
> python to select some of those values and to transform others. I am using
> an ExecuteScript processor to execute my python code. Here is my approach:
>
>
>
> = = = = =
>
> class PyStreamCallback(StreamCallback) :
>
>    def __init__ (self) :
>
>    def process(self, inputSteam, outputStream) :
>
>       stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8)  #
> what happens to my binary and extreme chars when they get passed through
> this step?
>
>      .
>
>      . (transform and pick out select content)
>
>      .
>
>      outputStream.write(bytearray(stuff.encode(‘utf-8’))))     # am I
> using the wrong functions to put my text chars and my binary and my extreme
> chars back on the stream as a byte stream? What should I be doing to handle
> the variety of data?
>
>
>
> flowFile = session.get()
>
> if (flowFile!= None)
>
>    incoming = flowFile.getAttribute(‘filename’)
>
>    logging.info(‘about to process file: %s’, incoming)
>
>    flowFile = session.write(flowFile, PyStreamCallback())   # line 155 in
> my code
>
>    session.transfer(flowFile, REL_SUCCESS)
>
>    session.commit()
>
>
>
> = = = = =
>
>
>
> When my incoming flowfile is all character content - such as tagged xml -
> my code works fine. All the flowfiles that also contain some binary data
> and/or characters at the extremes such as foreign language characters don’t
> work. They error out. I suspect it has to do with the way I am writing back
> to the flowfile stream.
>
>
>
> Here is the error I am getting:
>
> Org.apache.nifi.processor.exception.ProcessException:
> javax.script.ScriptException: TypeError: write(): 1st arg can’t be
> coerced to int, byte[] in <script> at line number 155
>
>
>
> How should I handle the write back to the flowfile in cases where I have a
> mix of character and binary?
>
>
> Note: I must do this programmatically. I tried using a combination of
> SplitContent and MergeContent, but I have no consistent reliable
> demarcation between the regular text characters and the other more
> challenging characters that I can split on.
>
> All the examples I've found handle more pure circumstances than mine seems
> to be. For example, all text. Or all JSON. I've not yet been able to find
> an example that shows me how to write back to the stream for mixed data
> situations. Can you help?
>
>
>

Mime
View raw message