nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Taft <a...@adamtaft.com>
Subject Re: PULL ProvenanceEvent
Date Wed, 06 Nov 2019 19:07:46 GMT
+1 Joe - this is a good compromise to keep the original API undisturbed.


On Wed, Nov 6, 2019 at 11:05 AM Joe Witt <joe.witt@gmail.com> wrote:

> Nissim
>
> Notionally I am saying that session.getProvenanceReporter().receive(...)
> should have an option to call
> session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not
> specified it would be UNSPECIFIED.
>
> I dont think this needs to be on the flowfile attribute - it would go
> straight to the provenance event itself which is generated by the session.
>
> Thanks
> Joe
>
> On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman <nshiman@yahoo.com.invalid>
> wrote:
>
> >  Joe,
> >
> > Just to verify what you mean,
> >
> > You are saying that the line:
> > flowfile = session.putAttribute(flowfile, "receiveType", "active")
> >
> > could be added before
> > session.getProvenanceReporter().receive(...)
> >
> >
> > to indicate a PULL.  Is this correct?
> >
> > Thanks,
> >
> > Nissim
> >
> >
> >
> >
> >
> >
> >     On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman
> > <nshiman@yahoo.com.invalid> wrote:
> >
> >   Having an attribute added indicating passive/active/query for RECEIVE
> > and FETCH will work,
> >
> > but nifi attributes are stateful (i.e. they will still be on the flowfile
> > as metadata a couple of processor steps down the flow)
> >
> > Maybe an option is to expand the the api for RECEIVE and FETCH for with a
> > new parameter for passive/active/query ?
> > (i.e. the existing message signatures, such as  [1] will remain the same,
> > but new ones will be added to handle this new parameter?
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> >
> >
> >     On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <
> > joe.witt@gmail.com> wrote:
> >
> >  These distinctions may be meaningful.  Adding them as an attribute lets
> > the
> > meaning convey but not introduce complexity for the majority case which
> is
> > the distinction isnt key.
> >
> > thanks
> >
> > On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Mike,
> > > I like the QUERY type as well.  Basically a more refined PULL.  Very
> > nice.
> > >
> > >
> > > Part of the challenge of adding PULL as a type is that there are
> > currently
> > > two flavors of RECEIVEs.
> > > RECEIVE and FETCH [1]
> > >
> > > So any addition of a PULL would need a second flavor of PULL to match
> the
> > > case where a flowfile's contents are being overwritten as well (i.e. as
> > > FETCH is currently doing)
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
> > >
> > >
> > > Thanks,
> > > Nissim
> > >
> > >
> > >    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> > > mikerthomsen@gmail.com> wrote:
> > >
> > >  I like the idea of creating PULL as a type. In fact, I'd propose that
> > > there
> > > are three scenarios here:
> > >
> > > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> > > subscription
> > > PULL - Direct operations to seek out and fetch something in a targeted
> > > fashion. Ex. GetHttp
> > > QUERY - Go looking for the data and take what matches your search. Ex.
> > > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
> > >
> > >
> > >
> > > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Joe,
> > > >
> > > >
> > > > It is hard to say how much value transit URI would bring to clarify a
> > > > RECEIVE.
> > > > For example a RECEIVE with transit URI of https:<etc.> could be
> either
> > a
> > > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> > > >
> > > > but your idea of "a metadata item specifying active vs passive" is a
> > very
> > > > clever way to make this work with mimimal disruptions.
> > > >
> > > > My understanding of this is that the current receive() calls in
> > > > ProvenanceReporter [1] will remain the same, but news ones will be
> > added
> > > > with a boolean parameter reflecting if the receive is active or
> > passive.
> > > > This will allow the current list of Provenance Events [2] to remain
> the
> > > > same.  So third party/custom processors can continue working as is
> > > >
> > > > Does this sound like what you are thinking?
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> > > >
> > > > [2]
> > > > apache/nifi
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Nissim
> > > >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > > > joe.witt@gmail.com> wrote:
> > > >
> > > >  Nissim
> > > >
> > > > I like the idea to introduce a more refined type of event for how
> data
> > is
> > > > brought into nifi (active - PULL, passive - RECEIVE).
> > > >
> > > > That said it might be sufficient to simply have this distinction be
> on
> > > the
> > > > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > > > protocol utilized as mentioned in the transport URI should clarify
> this
> > > > though.
> > > >
> > > > In short - i think there is a way here that is all opt-in for
> existing
> > > > users and components.
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > >  Adam,
> > > > > good points...
> > > > > I missed a step in explaining the use case where Provenance Events
> is
> > > > > incomplete...
> > > > > Where the second nifi does a GetSFTP from the *filesytem* that the
> > > first
> > > > > nifi is located on
> > > > > So the second nifi currently sends a RECEIVE event, but there is
no
> > > > > corresponding SEND event from the first nifi (nor should there be)
> > > > > If the second nifi sent a PULL event, it would be easier for a
> system
> > > > > overseer to know that there should be no corresponding SEND event
> > > > >
> > > > > Currently send/receive works well when nifi 1 does a PostHTTP and
> > nifi
> > > 2
> > > > > does a ListenHTTP, but not in the case above.
> > > > >
> > > > > The ERROR case you mention is a nice point as well, although not
my
> > > > > specific issue at the moment.
> > > > > Thanks,
> > > > > Nissim
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > > > adam@adamtaft.com> wrote:
> > > > >
> > > > >  > But a flowfile that was PULLed by the second nifi (from the
> first
> > > > nifi)
> > > > > will not necessarily have any provenance event generated by the
> first
> > > > nifi.
> > > > >
> > > > > Isn't this the fault of the first NiFi to fail to emit a SEND event
> > in
> > > > > response to the second NiFi's request?  In this scenario, shouldn't
> > the
> > > > > send/receive pair be:
> > > > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > > > >
> > > > > What you describe is an odd use case for NiFi.  NiFi is usually not
> > in
> > > > the
> > > > > business of acting as a file server daemon in order to "send"
> > flowfiles
> > > > to
> > > > > other systems.  As you mention, HandleHttpResponse may be a lone
> wolf
> > > > > example processor which generates a SEND event whose input
> originates
> > > > from
> > > > > a "listener". [1]  The other ListenXYZ processors generally issue
> > > RECEIVE
> > > > > events because they are receiving bytes, not generating them.
> > > > >
> > > > > Are there other processors in question? Something custom? Or is
> this
> > > > > related to site-to-site transfers?
> > > > >
> > > > > I still kind of question the motive of a provenance event pair that
> > is
> > > > > trying to establish "who called who first".  Honestly just trying
> to
> > > > > understand the use case where a matching SEND/RECEIVE pair doesn't
> > give
> > > > you
> > > > > what you need.
> > > > >
> > > > > The only thing I could see would be a processor that asks for data,
> > but
> > > > > then doesn't receive it due to some error condition.  In this case,
> > > > adding
> > > > > some sort of ERROR event might be useful.  "I attempted to retrieve
> > > data
> > > > > from ${uri}, but the transfer failed because of ${error
> condition}".
> > > > That
> > > > > way, GetXYZ processors could report an error to provenance instead
> of
> > > as
> > > > a
> > > > > bulletin.
> > > > >
> > > > > If the problem is related to a processor or the framework itself
> not
> > > > > generating an event, can we just fix that function to emit SEND in
> > the
> > > > > scenario that you describe?  Changing the provenance model itself
> > > (beyond
> > > > > possibly adding an ERROR event) feels like it would be the last
> > > scenario
> > > > to
> > > > > consider.
> > > > >
> > > > > Thanks,
> > > > > Adam
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> > > <nshiman@yahoo.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > >  Adam,
> > > > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > > > A use case would be a customer that is trying to track data
> passed
> > > > > between
> > > > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > > > >
> > > > > > So a flowfile that has a SEND event on the first nifi should
> have a
> > > > > > RECEIVE event on the second nifi.
> > > > > > But a flowfile that was PULLed by the second nifi (from the
first
> > > nifi)
> > > > > > will not necessarily have any provenance event generated by
the
> > first
> > > > > nifi.
> > > > > >
> > > > > > (I realize that FETCH is already a "reserved word" in the current
> > > > > > ProvenanceEvents setup, so I was hoping PULL could be used
> > instead.)
> > > > > > There is another Provenance Event, ACKNOWLEDGE, which would
also
> > fit
> > > > > > occasionally to this model as well (an example would be
> > > > > HandleHttpResponse
> > > > > > processor which could send this instead of SEND when responding
> to
> > a
> > > > HTTP
> > > > > > request)
> > > > > > This being said, you make an excellent point when you said
> > > > > > "However even more important to realize,
> > > > > > this change would affect many other downstream consumers of
> > > provenance
> > > > > data
> > > > > > which aren't necessarily in the stock NiFi distribution."
> > > > > > Thanks,
> > > > > > Nissim
> > > > > >
> > > > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > > > <nshiman@yahoo.com.invalid> wrote:
> > > > > >
> > > > > >  Adam,
> > > > > > "Yes" to your first question and the four processor examples
you
> > > > listed.
> > > > > >
> > > > > > I will need to get back to you regarding your other points.
> > > > > >
> > > > > > Thanks,
> > > > > > Nissim
> > > > > >
> > > > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft
<
> > > > > > adam@adamtaft.com> wrote:
> > > > > >
> > > > > >  Nissim,
> > > > > >
> > > > > > Just to be clear, you are trying to distinguish between
> processors
> > > > which
> > > > > > are actively "pulling" data (GetXYZ) vs. processors which just
> > > "listen"
> > > > > for
> > > > > > data (ListenXYZ)?  Is that your basic vision?
> > > > > >
> > > > > > GetFile => PULL
> > > > > > GetHTTP => PULL
> > > > > > ListenHTTP => RECEIVE
> > > > > > ListenTCP => RECEIVE
> > > > > >
> > > > > > Could you clarify what advantages this would have in terms of
> data
> > > > > > provenance?  What would you use this new event type for
> > specifically?
> > > > > What
> > > > > > are you missing now? Do you have a use case that needs this,
or
> are
> > > you
> > > > > > just generally trying to round out the provenance event types
for
> > > sake
> > > > of
> > > > > > completeness?  I honestly don't know a use case where you care
> > > whether
> > > > > you
> > > > > > polled for the data or listened for it.  The provenance model
> today
> > > > just
> > > > > > cares that you received the data, not so much how you received
> it.
> > > > > >
> > > > > > You're right that this proposal will affect many processors
and
> the
> > > > > > internal visualization tools, etc.  However even more important
> to
> > > > > realize,
> > > > > > this change would affect many other downstream consumers of
> > > provenance
> > > > > data
> > > > > > which aren't necessarily in the stock NiFi distribution.  For
> > > example,
> > > > > any
> > > > > > third-party/custom ReportingTask that handles provenance data
> would
> > > > need
> > > > > to
> > > > > > be updated with this change.  There's probably need for a strong
> > > vision
> > > > > to
> > > > > > help demonstrate the value for this vs. the cost of the cascading
> > > > effects
> > > > > > related to this change.
> > > > > >
> > > > > > Thanks,
> > > > > > Adam
> > > > > >
> > > > > >
> > > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > > > <nshiman@yahoo.com.invalid
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Team,
> > > > > > >
> > > > > > > The ProvenanceEventType class does a good job capturing
> possible
> > > > > events,
> > > > > > > but the PULL event doesn't seem to fall nicely into any
of the
> > > > existing
> > > > > > > types.
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't
> > capture
> > > > the
> > > > > > > active action of a PULL
> > > > > > >
> > > > > > > And... maybe it would fall into FETCH, but FETCH is more
> focused
> > on
> > > > > > > contents of an existing flow file being overwritten.
> > > > > > >
> > > > > > > What does the community think about a new PULL event type,
> > > > > > > or
> > > > > > >  using FETCH for PULL, and having what FETCH does now be
a new
> > > event
> > > > > such
> > > > > > > as REUSE
> > > > > > >
> > > > > > > NOTE: a new PULL event would have a cascading effect of
many
> > > > processors
> > > > > > > that currently are emitting RECEIVE's being modified to
be PULL
> > > > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather
a
> > PULL),
> > > > but
> > > > > > > would more accurately capture the event.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Nissim Shiman
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > |
> > > |
> > > |
> > > |  |  |
> > >
> > >  |
> > >
> > >  |
> > > |
> > > |  |
> > > apache/nifi
> > >
> > > Mirror of Apache NiFi. Contribute to apache/nifi development by
> creating
> > > an account on GitHub.
> > >  |
> > >
> > >  |
> > >
> > >  |
> > >
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message