Thanks Joe,

The use case is that I'm receiving data without knowing what character set it is coming in.  --mime-encoding is giving it's best guess on character set rather than the content type.

The ListFile sounds interesting, but I wonder if I really even need that.  I don't want to leave the files in place, I just want to run an external command on them as part of the data flow.  Is there a way I can run an external command against the physical file such as /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute somewhere?  It just seems wasteful to make an extra copy of the file, in order to run a read-only command on it, then delete it.  If ListFiles is still the right way to go, please let me know.


On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <joe.witt@gmail.com> wrote:
For identifying the mime type you may have sufficient results with the
existing processor 'IdentifyMimeType' which you can put into the flow.

For better logic around identifying files to pull but first calling an
external command to learn more about them the upcoming
ListFile/FetchFile combo that comes from this JIRA [1] might give you
better flexibility.

[1] https://issues.apache.org/jira/browse/NIFI-631

Thanks
Joe

On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
<charliefrasure@gmail.com> wrote:
> Thanks everyone for the help.  The trouble started a few processors earlier
> in an ExecuteStreamCommand on ${filename} with the result of "file not
> found".  I had originally set my GetFile processor to not remove files, but
> recently changed that.  Now it seems that my ExecuteStreamCommand may not be
> the best way to accomplish this.
>
> The command that gets executed is: file -b --mime-encoding ${filename}
> in the working directory: ${absolute.path}
>
> Now that the file is no longer in the source directory when the processor
> fires, the command is broken.  I could PutFile somewhere temporarily; is
> there a better way?
>
> On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <joe.witt@gmail.com> wrote:
>>
>> Charlie,
>>
>> The fact that this is confusing is something we agree should be more
>> clear and we will improve.  We're tackling it based on what is
>> mentioned here [1].
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>>
>> Thanks
>> Joe
>>
>> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers <cflowers@onyxpoint.com>
>> wrote:
>> > These guys are right. The file to look in for the uuid is the
>> > nifi-app.log.
>> > Also if you wanted to see what the processor itself was doing, you could
>> > right click on the processor, get its uuid and while it is running, run
>> > (assuming it is on Linux):
>> >
>> > tail -F nifi-app.log | grep uuid
>> >
>> > This will just scroll the logs for that specific processor and will show
>> > you
>> > what it is doing. It should also tell you specific file names and uuids
>> > of
>> > the failing files.
>> >
>> > Hope that helps! Have a great night and good luck!
>> >
>> > Sent from my iPhone
>> >
>> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <hellojuan@gmail.com> wrote:
>> >
>> > You can also check the NiFi logs for a searchable id or for what the
>> > previous processor ID produced to help search provenance.
>> >
>> > On Nov 19, 2015 21:22, "Bryan Bende" <bbende@gmail.com> wrote:
>> >>
>> >> Charlie,
>> >>
>> >> The behavior you described usually means that the processor encountered
>> >> an
>> >> unexpected error which was thrown back to the framework which rolls
>> >> back the
>> >> processing of that flow file and leaves it in the queue, as opposed to
>> >> an
>> >> error it expected where it would usually route to a failure
>> >> relationship.
>> >>
>> >> Is the id that you see in the bulletin a uuid?
>> >>
>> >> There should still be some provenance events for this FlowFile from the
>> >> previous points in the flow. If it looks like the uuid of the FlowFile,
>> >> that
>> >> should be searchable from provenance using the search button on the
>> >> right.
>> >> Let us know if we can help more.
>> >>
>> >> -Bryan
>> >>
>> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
>> >> <charliefrasure@gmail.com> wrote:
>> >>>
>> >>> I have a question on troubleshooting a flow.  I've built a flow with
>> >>> no
>> >>> exception routing, just trying to process the expected values first.
>> >>> When a
>> >>> file exposes a problem with the logic in my flow, it queues up prior
>> >>> to the
>> >>> flow that is raising the bulletin.
>> >>>
>> >>> In the bulletin, I can see an id, but can't tell which file it is.
>> >>> Data
>> >>> provenance doesn't seem to help as it passed the flow on the last
>> >>> processor,
>> >>> but hasn't been logged (to my knowledge) on the next one.
>> >>>
>> >>> Is there a way to match the bulletin back to a file without creating a
>> >>> route for failed files?
>> >>
>> >>
>> >
>
>