nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Reading flowfile in a stream callback
Date Fri, 03 Nov 2017 13:05:29 GMT
Mime type detection can be difficult business but I trust Apache Tika
to do a far better job than I ever could.  The result you show for
JSON appears correct and I'd simply add that string to the list of
routing attributes that i treat as text.  Or I'd key off the charset
being being provided as that would tell me enough to know it is text
or however I wanted to treat it.

Thanks

On Fri, Nov 3, 2017 at 8:24 AM, James McMahon <jsmcmahon3@gmail.com> wrote:
> I've always found that IdentifyMimeType returns a wide, wide range of values
> for mime.type. There is often ambiguity that mime.type is a reliable
> indicator of the nature of the content. To illustrate, I've passed file.txt
> into Nifi that contains a string representation of json. I'd expect this to
> be handled as textual data, but mime.type gets set to
> application/json;charset=UTF-8.
>
> Perhaps I am misusing the attribute mime.type. How have you worked around
> this challenge Joe?
>
> On Fri, Nov 3, 2017 at 7:54 AM, Joe Witt <joe.witt@gmail.com> wrote:
>>
>> "How can discern binary or character content using conditional checks
>> to be sure I handle the file properly?"
>>
>> Use NiFi and the existing processors where able and extend/script only
>> where necessary/critical.  For the case you mention use
>> IdentifyMimeType and route appropriate data to the appropriate script
>> execution.
>>
>> Joe
>>
>> On Fri, Nov 3, 2017 at 7:04 AM, James McMahon <jsmcmahon3@gmail.com>
>> wrote:
>> > Andy, regarding the the code sample you offered above - doesn't this put
>> > into text both the attributes metadata and the payload of the flowfile?
>> >
>> > If that is the case, how does one modify that to read in from the stream
>> > into variable text only the file payload?
>> >
>> > On Fri, Nov 3, 2017 at 5:48 AM, James McMahon <jsmcmahon3@gmail.com>
>> > wrote:
>> >>
>> >> Thank you Andy. I'd like to ask just a few quick follow up questions.
>> >>
>> >> 1- My flow content may be textual characters, and it can also be binary
>> >> -
>> >> jpgs, pngs, and similar. How can discern binary or character content
>> >> using
>> >> conditional checks to be sure I handle the file properly? How would I
>> >> alter
>> >> this
>> >>
>> >> text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>> >>
>> >> to read in the data from the stream as binary data in that case?
>> >>
>> >> 2- In the case where my data in the flowfile payload is binary, do I
>> >> have
>> >> another version of this....
>> >>
>> >> outputStream.write(bytearray(reversedText.encode('utf-8')))
>> >>
>> >> ....that omits the encoding, like so:
>> >>
>> >> outputStream.write(bytearray(some_binary))  ?
>> >>
>> >> Thank you very much in advance. -Jim
>> >>
>> >> On Thu, Nov 2, 2017 at 8:26 PM, Andy LoPresto <alopresto@apache.org>
>> >> wrote:
>> >>>
>> >>> James,
>> >>>
>> >>> The Python API should be the same as the Java FlowFile.java interface
>> >>> [1]. Matt Burgess’ blog has a good post about using Jython to do
>> >>> flowfile
>> >>> content manipulation. Something like:
>> >>>
>> >>> flowFile = session.get()
>> >>> if (flowFile != None):
>> >>>   flowFile = session.write(flowFile,PyStreamCallback())
>> >>>   session.transfer(flowFile, REL_SUCCESS)
>> >>>
>> >>> With PyStreamCallback declared as a class above that block in the
>> >>> script:
>> >>>
>> >>> import java.io
>> >>> from org.apache.commons.io import IOUtils
>> >>> from java.nio.charset import StandardCharsets
>> >>> from org.apache.nifi.processor.io import StreamCallback
>> >>>
>> >>> class PyStreamCallback(StreamCallback):
>> >>>   def __init__(self):
>> >>>         pass
>> >>>   def process(self, inputStream, outputStream):
>> >>>     text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>> >>>     reversedText = text[::-1]
>> >>>
>> >>>     outputStream.write(bytearray(reversedText.encode('utf-8')))
>> >>>
>> >>> In Groovy, you can declare the StreamCallback as an inline closure to
>> >>> make this more compact, but I believe in Jython it needs to be a
>> >>> separate
>> >>> declaration. Hope this helps.
>> >>>
>> >>> [1]
>> >>>
>> >>> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFile.java
>> >>> [2]
>> >>>
>> >>> https://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited_14.html
>> >>>
>> >>>
>> >>> Andy LoPresto
>> >>> alopresto@apache.org
>> >>> alopresto.apache@gmail.com
>> >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> >>>
>> >>> On Nov 2, 2017, at 12:53 PM, James McMahon <jsmcmahon3@gmail.com>
>> >>> wrote:
>> >>>
>> >>> In python, I can use the requests library to post content something
>> >>> like
>> >>> htis:
>> >>>
>> >>> import requests
>> >>> url="https://abc.test.org"
>> >>> files={'file':open('/somedir/myfile.txt','rb')}
>> >>> r = requests.post(url,files=files)
>> >>>
>> >>> If I am in a python stream callback, how can I read the flowfile
>> >>> payload
>> >>> in the same way that the open() reads its file from disk?
>> >>>
>> >>>
>> >>
>> >
>
>

Mime
View raw message