nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James McMahon <jsmcmah...@gmail.com>
Subject Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary
Date Tue, 07 Feb 2017 21:10:39 GMT
There are still issues without using bytearray[], Matt. I tried using a
json function in its place to format my data as json, and it still occurs.

I still have this problem but have implemented a temp workaround. I don't
think it is a very good one, as it turns out. As one of our previous
collaborators suggested, I SplitContent and then later after operating on
just the text data in the header I MergeContent to bring the pieces back
together. Problem with this is that there are limits to the number of
"split" flowfiles you can try to bring back at any one time. And to make it
15-20K, you need to up a parameter in nifi.properties. If I bump it up to
20,000 , let's say, then as soon as the 20001 fragment appears it rolls the
oldest one off to Failure. I can't have this happen. My flow volume is much
too high to throttle it down to this level. While I got it to work for the
time being by restricting my ListFile prior to my FetchFile, my approach
will nto scale to my customer's needs.

I hope this makes some modest sense. I am typing all this in her from home
without my NiFi flow and etails in front of me. Cheers and thanks again fro
any future insights you may have. -Jim Mc.

On Fri, Feb 3, 2017 at 10:39 PM, Matt Burgess <mattyb149@apache.org> wrote:

> James,
>
> I haven't had a chance to dig into this yet, but one thing I noticed
> about your script was an issue identified by Bryan Rosander (NiFi
> committer and all-around good guy :) as the probable cause of the
> TypeError, namely the calling of bytearray() after encode() (the
> latter of which already returns a byte array) [1]. Does removing the
> call to bytearray() fix your script, or are there still issues with
> decoding the input stream?
>
> Regards,
> Matt
>
> [1] https://community.hortonworks.com/questions/81291/nifi-
> executescript-processor-error-using-string-in.html
>
>
> On Thu, Feb 2, 2017 at 5:38 PM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
> > I have a flowfile that has tagged character information I need to get at
> > throughout the first few sections of the file. I need to use regex in
> python
> > to select some of those values and to transform others. I am using an
> > ExecuteScript processor to execute my python code. Here is my approach:
> >
> >
> >
> > = = = = =
> >
> > class PyStreamCallback(StreamCallback) :
> >
> >    def __init__ (self) :
> >
> >    def process(self, inputSteam, outputStream) :
> >
> >       stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8)  #
> what
> > happens to my binary and extreme chars when they get passed through this
> > step?
> >
> >      .
> >
> >      . (transform and pick out select content)
> >
> >      .
> >
> >      outputStream.write(bytearray(stuff.encode(‘utf-8’))))     # am I
> using
> > the wrong functions to put my text chars and my binary and my extreme
> chars
> > back on the stream as a byte stream? What should I be doing to handle the
> > variety of data?
> >
> >
> >
> > flowFile = session.get()
> >
> > if (flowFile!= None)
> >
> >    incoming = flowFile.getAttribute(‘filename’)
> >
> >    logging.info(‘about to process file: %s’, incoming)
> >
> >    flowFile = session.write(flowFile, PyStreamCallback())   # line 155
> in my
> > code
> >
> >    session.transfer(flowFile, REL_SUCCESS)
> >
> >    session.commit()
> >
> >
> >
> > = = = = =
> >
> >
> >
> > When my incoming flowfile is all character content - such as tagged xml
> - my
> > code works fine. All the flowfiles that also contain some binary data
> and/or
> > characters at the extremes such as foreign language characters don’t
> work.
> > They error out. I suspect it has to do with the way I am writing back to
> the
> > flowfile stream.
> >
> >
> >
> > Here is the error I am getting:
> >
> > Org.apache.nifi.processor.exception.ProcessException:
> > javax.script.ScriptException: TypeError: write(): 1st arg can’t be
> coerced
> > to int, byte[] in <script> at line number 155
> >
> >
> >
> > How should I handle the write back to the flowfile in cases where I have
> a
> > mix of character and binary?
> >
> >
> >
> > Note: I must do this programmatically. I tried using a combination of
> > SplitContent and MergeContent, but I have no consistent reliable
> demarcation
> > between the regular text characters and the other more challenging
> > characters that I can split on.
> >
> > All the examples I've found handle more pure circumstances than mine
> seems
> > to be. For example, all text. Or all JSON. I've not yet been able to
> find an
> > example that shows me how to write back to the stream for mixed data
> > situations. Can you help?
>

Mime
View raw message