nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charlie Frasure <charliefras...@gmail.com>
Subject Re: queued files
Date Fri, 20 Nov 2015 14:52:59 GMT
I'm definitely game for that.  Let me know what I can do to help.

On Fri, Nov 20, 2015 at 9:35 AM, Joe Witt <joe.witt@gmail.com> wrote:

> Charlie
>
> Got ya.  I missed the 'encoding vs content type' thing.  I agree let's
> find a way to avoid the extra copy.  We dont expose the storage
> location of the underlying bytes.  So on the ListFile thing.  What I
> was thinking was this (and honestly I've not tested this so maybe i'm
> skipping something important)
>
> ListFile to get a listing of names/etc.. of interest
>
> Execute the 'file --mime-encoding ${filename}' to get more attributes
> available to work with
>
> RouteOnAttribute to decide what to do with the file next.  You can
> Fetch/delete what you don't want you can Fetch/pass on what you do
>
> I was looking for a way to check the mime-encoding while passing the
> data to detect into an input stream.  because that is actually how
> execute stream command wants to work.
>
> This is a use case that should be pretty easy so if you're willing to
> chat through it with us we'll figure out a path to make it work well.
>
> Thanks
> Joe
>
> On Fri, Nov 20, 2015 at 9:17 AM, Charlie Frasure
> <charliefrasure@gmail.com> wrote:
> > Thanks Joe,
> >
> > The use case is that I'm receiving data without knowing what character
> set
> > it is coming in.  --mime-encoding is giving it's best guess on character
> set
> > rather than the content type.
> >
> > The ListFile sounds interesting, but I wonder if I really even need
> that.  I
> > don't want to leave the files in place, I just want to run an external
> > command on them as part of the data flow.  Is there a way I can run an
> > external command against the physical file such as
> > /opt/nifi/somedir/12345.uuid?  Would that info be in an attribute
> somewhere?
> > It just seems wasteful to make an extra copy of the file, in order to
> run a
> > read-only command on it, then delete it.  If ListFiles is still the right
> > way to go, please let me know.
> >
> >
> > On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <joe.witt@gmail.com> wrote:
> >>
> >> For identifying the mime type you may have sufficient results with the
> >> existing processor 'IdentifyMimeType' which you can put into the flow.
> >>
> >> For better logic around identifying files to pull but first calling an
> >> external command to learn more about them the upcoming
> >> ListFile/FetchFile combo that comes from this JIRA [1] might give you
> >> better flexibility.
> >>
> >> [1] https://issues.apache.org/jira/browse/NIFI-631
> >>
> >> Thanks
> >> Joe
> >>
> >> On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure
> >> <charliefrasure@gmail.com> wrote:
> >> > Thanks everyone for the help.  The trouble started a few processors
> >> > earlier
> >> > in an ExecuteStreamCommand on ${filename} with the result of "file not
> >> > found".  I had originally set my GetFile processor to not remove
> files,
> >> > but
> >> > recently changed that.  Now it seems that my ExecuteStreamCommand may
> >> > not be
> >> > the best way to accomplish this.
> >> >
> >> > The command that gets executed is: file -b --mime-encoding ${filename}
> >> > in the working directory: ${absolute.path}
> >> >
> >> > Now that the file is no longer in the source directory when the
> >> > processor
> >> > fires, the command is broken.  I could PutFile somewhere temporarily;
> is
> >> > there a better way?
> >> >
> >> > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <joe.witt@gmail.com>
> wrote:
> >> >>
> >> >> Charlie,
> >> >>
> >> >> The fact that this is confusing is something we agree should be more
> >> >> clear and we will improve.  We're tackling it based on what is
> >> >> mentioned here [1].
> >> >>
> >> >> [1]
> >> >>
> >> >>
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
> >> >>
> >> >> Thanks
> >> >> Joe
> >> >>
> >> >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers
> >> >> <cflowers@onyxpoint.com>
> >> >> wrote:
> >> >> > These guys are right. The file to look in for the uuid is the
> >> >> > nifi-app.log.
> >> >> > Also if you wanted to see what the processor itself was doing,
you
> >> >> > could
> >> >> > right click on the processor, get its uuid and while it is running,
> >> >> > run
> >> >> > (assuming it is on Linux):
> >> >> >
> >> >> > tail -F nifi-app.log | grep uuid
> >> >> >
> >> >> > This will just scroll the logs for that specific processor and
will
> >> >> > show
> >> >> > you
> >> >> > what it is doing. It should also tell you specific file names
and
> >> >> > uuids
> >> >> > of
> >> >> > the failing files.
> >> >> >
> >> >> > Hope that helps! Have a great night and good luck!
> >> >> >
> >> >> > Sent from my iPhone
> >> >> >
> >> >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <hellojuan@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > You can also check the NiFi logs for a searchable id or for what
> the
> >> >> > previous processor ID produced to help search provenance.
> >> >> >
> >> >> > On Nov 19, 2015 21:22, "Bryan Bende" <bbende@gmail.com>
wrote:
> >> >> >>
> >> >> >> Charlie,
> >> >> >>
> >> >> >> The behavior you described usually means that the processor
> >> >> >> encountered
> >> >> >> an
> >> >> >> unexpected error which was thrown back to the framework which
> rolls
> >> >> >> back the
> >> >> >> processing of that flow file and leaves it in the queue, as
> opposed
> >> >> >> to
> >> >> >> an
> >> >> >> error it expected where it would usually route to a failure
> >> >> >> relationship.
> >> >> >>
> >> >> >> Is the id that you see in the bulletin a uuid?
> >> >> >>
> >> >> >> There should still be some provenance events for this FlowFile
> from
> >> >> >> the
> >> >> >> previous points in the flow. If it looks like the uuid of
the
> >> >> >> FlowFile,
> >> >> >> that
> >> >> >> should be searchable from provenance using the search button
on
> the
> >> >> >> right.
> >> >> >> Let us know if we can help more.
> >> >> >>
> >> >> >> -Bryan
> >> >> >>
> >> >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure
> >> >> >> <charliefrasure@gmail.com> wrote:
> >> >> >>>
> >> >> >>> I have a question on troubleshooting a flow.  I've built
a flow
> >> >> >>> with
> >> >> >>> no
> >> >> >>> exception routing, just trying to process the expected
values
> >> >> >>> first.
> >> >> >>> When a
> >> >> >>> file exposes a problem with the logic in my flow, it queues
up
> >> >> >>> prior
> >> >> >>> to the
> >> >> >>> flow that is raising the bulletin.
> >> >> >>>
> >> >> >>> In the bulletin, I can see an id, but can't tell which
file it
> is.
> >> >> >>> Data
> >> >> >>> provenance doesn't seem to help as it passed the flow
on the last
> >> >> >>> processor,
> >> >> >>> but hasn't been logged (to my knowledge) on the next one.
> >> >> >>>
> >> >> >>> Is there a way to match the bulletin back to a file without
> >> >> >>> creating a
> >> >> >>> route for failed files?
> >> >> >>
> >> >> >>
> >> >> >
> >> >
> >> >
> >
> >
>

Mime
View raw message