nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: queued files
Date Fri, 20 Nov 2015 14:35:52 GMT
Charlie

Got ya.  I missed the 'encoding vs content type' thing.  I agree let's
find a way to avoid the extra copy.  We dont expose the storage
location of the underlying bytes.  So on the ListFile thing.  What I
was thinking was this (and honestly I've not tested this so maybe i'm
skipping something important)

ListFile to get a listing of names/etc.. of interest

Execute the 'file --mime-encoding ${filename}' to get more attributes
available to work with

RouteOnAttribute to decide what to do with the file next.  You can
Fetch/delete what you don't want you can Fetch/pass on what you do

I was looking for a way to check the mime-encoding while passing the
data to detect into an input stream.  because that is actually how
execute stream command wants to work.

This is a use case that should be pretty easy so if you're willing to
chat through it with us we'll figure out a path to make it work well.

Thanks
Joe

On Fri, Nov 20, 2015 at 9:17 AM, Charlie Frasure
<charliefrasure@gmail.com> wrote:
> Thanks Joe,
>
> The use case is that I'm receiving data without knowing what character set
> it is coming in.  --mime-encoding is giving it's best guess on character set
> rather than the content type.
>
> The ListFile sounds interesting, but I wonder if I really even need that.  I
> don't want to leave the files in place, I just want to run an external
> command on them as part of the data flow.  Is there a way I can run an
> external command against the physical file such as
> /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute somewhere?
> It just seems wasteful to make an extra copy of the file, in order to run a
> read-only command on it, then delete it.  If ListFiles is still the right
> way to go, please let me know.
>
>
> On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <joe.witt@gmail.com> wrote:
>>
>> For identifying the mime type you may have sufficient results with the
>> existing processor 'IdentifyMimeType' which you can put into the flow.
>>
>> For better logic around identifying files to pull but first calling an
>> external command to learn more about them the upcoming
>> ListFile/FetchFile combo that comes from this JIRA [1] might give you
>> better flexibility.
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-631
>>
>> Thanks
>> Joe
>>
>> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
>> <charliefrasure@gmail.com> wrote:
>> > Thanks everyone for the help.  The trouble started a few processors
>> > earlier
>> > in an ExecuteStreamCommand on ${filename} with the result of "file not
>> > found".  I had originally set my GetFile processor to not remove files,
>> > but
>> > recently changed that.  Now it seems that my ExecuteStreamCommand may
>> > not be
>> > the best way to accomplish this.
>> >
>> > The command that gets executed is: file -b --mime-encoding ${filename}
>> > in the working directory: ${absolute.path}
>> >
>> > Now that the file is no longer in the source directory when the
>> > processor
>> > fires, the command is broken.  I could PutFile somewhere temporarily; is
>> > there a better way?
>> >
>> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <joe.witt@gmail.com> wrote:
>> >>
>> >> Charlie,
>> >>
>> >> The fact that this is confusing is something we agree should be more
>> >> clear and we will improve.  We're tackling it based on what is
>> >> mentioned here [1].
>> >>
>> >> [1]
>> >>
>> >> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>> >>
>> >> Thanks
>> >> Joe
>> >>
>> >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers
>> >> <cflowers@onyxpoint.com>
>> >> wrote:
>> >> > These guys are right. The file to look in for the uuid is the
>> >> > nifi-app.log.
>> >> > Also if you wanted to see what the processor itself was doing, you
>> >> > could
>> >> > right click on the processor, get its uuid and while it is running,
>> >> > run
>> >> > (assuming it is on Linux):
>> >> >
>> >> > tail -F nifi-app.log | grep uuid
>> >> >
>> >> > This will just scroll the logs for that specific processor and will
>> >> > show
>> >> > you
>> >> > what it is doing. It should also tell you specific file names and
>> >> > uuids
>> >> > of
>> >> > the failing files.
>> >> >
>> >> > Hope that helps! Have a great night and good luck!
>> >> >
>> >> > Sent from my iPhone
>> >> >
>> >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <hellojuan@gmail.com>
>> >> > wrote:
>> >> >
>> >> > You can also check the NiFi logs for a searchable id or for what the
>> >> > previous processor ID produced to help search provenance.
>> >> >
>> >> > On Nov 19, 2015 21:22, "Bryan Bende" <bbende@gmail.com> wrote:
>> >> >>
>> >> >> Charlie,
>> >> >>
>> >> >> The behavior you described usually means that the processor
>> >> >> encountered
>> >> >> an
>> >> >> unexpected error which was thrown back to the framework which rolls
>> >> >> back the
>> >> >> processing of that flow file and leaves it in the queue, as opposed
>> >> >> to
>> >> >> an
>> >> >> error it expected where it would usually route to a failure
>> >> >> relationship.
>> >> >>
>> >> >> Is the id that you see in the bulletin a uuid?
>> >> >>
>> >> >> There should still be some provenance events for this FlowFile
from
>> >> >> the
>> >> >> previous points in the flow. If it looks like the uuid of the
>> >> >> FlowFile,
>> >> >> that
>> >> >> should be searchable from provenance using the search button on
the
>> >> >> right.
>> >> >> Let us know if we can help more.
>> >> >>
>> >> >> -Bryan
>> >> >>
>> >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
>> >> >> <charliefrasure@gmail.com> wrote:
>> >> >>>
>> >> >>> I have a question on troubleshooting a flow.  I've built a
flow
>> >> >>> with
>> >> >>> no
>> >> >>> exception routing, just trying to process the expected values
>> >> >>> first.
>> >> >>> When a
>> >> >>> file exposes a problem with the logic in my flow, it queues
up
>> >> >>> prior
>> >> >>> to the
>> >> >>> flow that is raising the bulletin.
>> >> >>>
>> >> >>> In the bulletin, I can see an id, but can't tell which file
it is.
>> >> >>> Data
>> >> >>> provenance doesn't seem to help as it passed the flow on the
last
>> >> >>> processor,
>> >> >>> but hasn't been logged (to my knowledge) on the next one.
>> >> >>>
>> >> >>> Is there a way to match the bulletin back to a file without
>> >> >>> creating a
>> >> >>> route for failed files?
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>
>

Mime
View raw message