nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James McMahon <jsmcmah...@gmail.com>
Subject Re: Reading flowfile in a stream callback
Date Wed, 08 Nov 2017 13:31:04 GMT
Thank you Andy, thank you again Joe. I'll rethink my approach based on your
recommendations.  -Jim

On Fri, Nov 3, 2017 at 1:31 PM, Andy LoPresto <alopresto@apache.org> wrote:

> James,
>
> I am not a Python expert, so I’m glad other people could weigh in. As far
> as routing on content type, I agree with Joe’s sentiment that
> IdentifyMimeType and RouteOnAttribute are the correct solutions there. You
> can route on a range of input options (the actual type, detected charset,
> etc.).
>
> I would definitely avoid putting code to handle multiple disparate content
> types (text vs. video, etc.) in the same ExecuteScript processor. This will
> be harder to test, maintain, enhance, etc. You’ll eventually reach a Switch
> Statement of Doom. Instead, approach this as each ES processor is a black
> box like a Unix tool — it does one thing really well — and chain them
> together. This is the philosophy NiFi is built on and you’ll have much more
> success swimming with the current than fighting it.
>
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <alopresto.apache@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Nov 3, 2017, at 6:05 AM, Joe Witt <joe.witt@gmail.com> wrote:
>
> Mime type detection can be difficult business but I trust Apache Tika
> to do a far better job than I ever could.  The result you show for
> JSON appears correct and I'd simply add that string to the list of
> routing attributes that i treat as text.  Or I'd key off the charset
> being being provided as that would tell me enough to know it is text
> or however I wanted to treat it.
>
> Thanks
>
> On Fri, Nov 3, 2017 at 8:24 AM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
>
> I've always found that IdentifyMimeType returns a wide, wide range of
> values
> for mime.type. There is often ambiguity that mime.type is a reliable
> indicator of the nature of the content. To illustrate, I've passed file.txt
> into Nifi that contains a string representation of json. I'd expect this to
> be handled as textual data, but mime.type gets set to
> application/json;charset=UTF-8.
>
> Perhaps I am misusing the attribute mime.type. How have you worked around
> this challenge Joe?
>
> On Fri, Nov 3, 2017 at 7:54 AM, Joe Witt <joe.witt@gmail.com> wrote:
>
>
> "How can discern binary or character content using conditional checks
> to be sure I handle the file properly?"
>
> Use NiFi and the existing processors where able and extend/script only
> where necessary/critical.  For the case you mention use
> IdentifyMimeType and route appropriate data to the appropriate script
> execution.
>
> Joe
>
> On Fri, Nov 3, 2017 at 7:04 AM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
>
> Andy, regarding the the code sample you offered above - doesn't this put
> into text both the attributes metadata and the payload of the flowfile?
>
> If that is the case, how does one modify that to read in from the stream
> into variable text only the file payload?
>
> On Fri, Nov 3, 2017 at 5:48 AM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
>
>
> Thank you Andy. I'd like to ask just a few quick follow up questions.
>
> 1- My flow content may be textual characters, and it can also be binary
> -
> jpgs, pngs, and similar. How can discern binary or character content
> using
> conditional checks to be sure I handle the file properly? How would I
> alter
> this
>
> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>
> to read in the data from the stream as binary data in that case?
>
> 2- In the case where my data in the flowfile payload is binary, do I
> have
> another version of this....
>
> outputStream.write(bytearray(reversedText.encode('utf-8')))
>
> ....that omits the encoding, like so:
>
> outputStream.write(bytearray(some_binary))  ?
>
> Thank you very much in advance. -Jim
>
> On Thu, Nov 2, 2017 at 8:26 PM, Andy LoPresto <alopresto@apache.org>
> wrote:
>
>
> James,
>
> The Python API should be the same as the Java FlowFile.java interface
> [1]. Matt Burgess’ blog has a good post about using Jython to do
> flowfile
> content manipulation. Something like:
>
> flowFile = session.get()
> if (flowFile != None):
>  flowFile = session.write(flowFile,PyStreamCallback())
>  session.transfer(flowFile, REL_SUCCESS)
>
> With PyStreamCallback declared as a class above that block in the
> script:
>
> import java.io
> from org.apache.commons.io import IOUtils
> from java.nio.charset import StandardCharsets
> from org.apache.nifi.processor.io import StreamCallback
>
> class PyStreamCallback(StreamCallback):
>  def __init__(self):
>        pass
>  def process(self, inputStream, outputStream):
>    text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>    reversedText = text[::-1]
>
>    outputStream.write(bytearray(reversedText.encode('utf-8')))
>
> In Groovy, you can declare the StreamCallback as an inline closure to
> make this more compact, but I believe in Jython it needs to be a
> separate
> declaration. Hope this helps.
>
> [1]
>
> https://github.com/apache/nifi/blob/master/nifi-api/src/
> main/java/org/apache/nifi/flowfile/FlowFile.java
> [2]
>
> https://funnifi.blogspot.com/2016/03/executescript-json-to-
> json-revisited_14.html
>
>
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Nov 2, 2017, at 12:53 PM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
>
> In python, I can use the requests library to post content something
> like
> htis:
>
> import requests
> url="https://abc.test.org"
> files={'file':open('/somedir/myfile.txt','rb')}
> r = requests.post(url,files=files)
>
> If I am in a python stream callback, how can I read the flowfile
> payload
> in the same way that the open() reads its file from disk?
>
>
>
>
>
>
>
>

Mime
View raw message