nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: PULL ProvenanceEvent
Date Wed, 06 Nov 2019 18:04:52 GMT
Nissim

Notionally I am saying that session.getProvenanceReporter().receive(...)
should have an option to call
session.getProvenanceReporter().receive(...,ACTIVE|PASSIVE) and if not
specified it would be UNSPECIFIED.

I dont think this needs to be on the flowfile attribute - it would go
straight to the provenance event itself which is generated by the session.

Thanks
Joe

On Wed, Nov 6, 2019 at 11:32 AM Nissim Shiman <nshiman@yahoo.com.invalid>
wrote:

>  Joe,
>
> Just to verify what you mean,
>
> You are saying that the line:
> flowfile = session.putAttribute(flowfile, "receiveType", "active")
>
> could be added before
> session.getProvenanceReporter().receive(...)
>
>
> to indicate a PULL.  Is this correct?
>
> Thanks,
>
> Nissim
>
>
>
>
>
>
>     On Monday, November 4, 2019, 12:50:11 PM EST, Nissim Shiman
> <nshiman@yahoo.com.invalid> wrote:
>
>   Having an attribute added indicating passive/active/query for RECEIVE
> and FETCH will work,
>
> but nifi attributes are stateful (i.e. they will still be on the flowfile
> as metadata a couple of processor steps down the flow)
>
> Maybe an option is to expand the the api for RECEIVE and FETCH for with a
> new parameter for passive/active/query ?
> (i.e. the existing message signatures, such as  [1] will remain the same,
> but new ones will be added to handle this new parameter?
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
>
>
>     On Thursday, October 31, 2019, 10:10:40 PM EDT, Joe Witt <
> joe.witt@gmail.com> wrote:
>
>  These distinctions may be meaningful.  Adding them as an attribute lets
> the
> meaning convey but not introduce complexity for the majority case which is
> the distinction isnt key.
>
> thanks
>
> On Thu, Oct 31, 2019 at 4:05 PM Nissim Shiman <nshiman@yahoo.com.invalid>
> wrote:
>
> >  Mike,
> > I like the QUERY type as well.  Basically a more refined PULL.  Very
> nice.
> >
> >
> > Part of the challenge of adding PULL as a type is that there are
> currently
> > two flavors of RECEIVEs.
> > RECEIVE and FETCH [1]
> >
> > So any addition of a PULL would need a second flavor of PULL to match the
> > case where a flowfile's contents are being overwritten as well (i.e. as
> > FETCH is currently doing)
> >
> >
> > [1]
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java#L42
> >
> >
> > Thanks,
> > Nissim
> >
> >
> >    On Wednesday, October 30, 2019, 6:41:04 PM EDT, Mike Thomsen <
> > mikerthomsen@gmail.com> wrote:
> >
> >  I like the idea of creating PULL as a type. In fact, I'd propose that
> > there
> > are three scenarios here:
> >
> > RECEIVE - Passively acquire in a sort of hand-off situation. Ex: Kafka
> > subscription
> > PULL - Direct operations to seek out and fetch something in a targeted
> > fashion. Ex. GetHttp
> > QUERY - Go looking for the data and take what matches your search. Ex.
> > JsonQueryElasticsearch, GetMongo, any SQL query processor, etc.
> >
> >
> >
> > On Wed, Oct 30, 2019 at 1:31 PM Nissim Shiman <nshiman@yahoo.com.invalid
> >
> > wrote:
> >
> > >  Joe,
> > >
> > >
> > > It is hard to say how much value transit URI would bring to clarify a
> > > RECEIVE.
> > > For example a RECEIVE with transit URI of https:<etc.> could be either
> a
> > > GetHTTP (i.e. active) or ListenHTTP (i.e. passive)
> > >
> > > but your idea of "a metadata item specifying active vs passive" is a
> very
> > > clever way to make this work with mimimal disruptions.
> > >
> > > My understanding of this is that the current receive() calls in
> > > ProvenanceReporter [1] will remain the same, but news ones will be
> added
> > > with a boolean parameter reflecting if the receive is active or
> passive.
> > > This will allow the current list of Provenance Events [2] to remain the
> > > same.  So third party/custom processors can continue working as is
> > >
> > > Does this sound like what you are thinking?
> > >
> > >
> > > [1]
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceReporter.java#L46
> > >
> > > [2]
> > > apache/nifi
> > >
> > >
> > > Thanks,
> > >
> > > Nissim
> > >    On Tuesday, October 29, 2019, 12:47:40 PM EDT, Joe Witt <
> > > joe.witt@gmail.com> wrote:
> > >
> > >  Nissim
> > >
> > > I like the idea to introduce a more refined type of event for how data
> is
> > > brought into nifi (active - PULL, passive - RECEIVE).
> > >
> > > That said it might be sufficient to simply have this distinction be on
> > the
> > > "RECEIVE" event as a metadata item specifying active vs passive.  The
> > > protocol utilized as mentioned in the transport URI should clarify this
> > > though.
> > >
> > > In short - i think there is a way here that is all opt-in for existing
> > > users and components.
> > >
> > > Thanks
> > >
> > > On Tue, Oct 29, 2019 at 9:41 AM Nissim Shiman
> <nshiman@yahoo.com.invalid
> > >
> > > wrote:
> > >
> > > >  Adam,
> > > > good points...
> > > > I missed a step in explaining the use case where Provenance Events is
> > > > incomplete...
> > > > Where the second nifi does a GetSFTP from the *filesytem* that the
> > first
> > > > nifi is located on
> > > > So the second nifi currently sends a RECEIVE event, but there is no
> > > > corresponding SEND event from the first nifi (nor should there be)
> > > > If the second nifi sent a PULL event, it would be easier for a system
> > > > overseer to know that there should be no corresponding SEND event
> > > >
> > > > Currently send/receive works well when nifi 1 does a PostHTTP and
> nifi
> > 2
> > > > does a ListenHTTP, but not in the case above.
> > > >
> > > > The ERROR case you mention is a nice point as well, although not my
> > > > specific issue at the moment.
> > > > Thanks,
> > > > Nissim
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >    On Monday, October 28, 2019, 11:52:57 PM EDT, Adam Taft <
> > > > adam@adamtaft.com> wrote:
> > > >
> > > >  > But a flowfile that was PULLed by the second nifi (from the first
> > > nifi)
> > > > will not necessarily have any provenance event generated by the first
> > > nifi.
> > > >
> > > > Isn't this the fault of the first NiFi to fail to emit a SEND event
> in
> > > > response to the second NiFi's request?  In this scenario, shouldn't
> the
> > > > send/receive pair be:
> > > > NiFi-1 [SEND] :: NIFI-2 [RECEIVE]?
> > > >
> > > > What you describe is an odd use case for NiFi.  NiFi is usually not
> in
> > > the
> > > > business of acting as a file server daemon in order to "send"
> flowfiles
> > > to
> > > > other systems.  As you mention, HandleHttpResponse may be a lone wolf
> > > > example processor which generates a SEND event whose input originates
> > > from
> > > > a "listener". [1]  The other ListenXYZ processors generally issue
> > RECEIVE
> > > > events because they are receiving bytes, not generating them.
> > > >
> > > > Are there other processors in question? Something custom? Or is this
> > > > related to site-to-site transfers?
> > > >
> > > > I still kind of question the motive of a provenance event pair that
> is
> > > > trying to establish "who called who first".  Honestly just trying to
> > > > understand the use case where a matching SEND/RECEIVE pair doesn't
> give
> > > you
> > > > what you need.
> > > >
> > > > The only thing I could see would be a processor that asks for data,
> but
> > > > then doesn't receive it due to some error condition.  In this case,
> > > adding
> > > > some sort of ERROR event might be useful.  "I attempted to retrieve
> > data
> > > > from ${uri}, but the transfer failed because of ${error condition}".
> > > That
> > > > way, GetXYZ processors could report an error to provenance instead of
> > as
> > > a
> > > > bulletin.
> > > >
> > > > If the problem is related to a processor or the framework itself not
> > > > generating an event, can we just fix that function to emit SEND in
> the
> > > > scenario that you describe?  Changing the provenance model itself
> > (beyond
> > > > possibly adding an ERROR event) feels like it would be the last
> > scenario
> > > to
> > > > consider.
> > > >
> > > > Thanks,
> > > > Adam
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/HandleHttpResponse.java#L191
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Oct 28, 2019 at 4:47 PM Nissim Shiman
> > <nshiman@yahoo.com.invalid
> > > >
> > > > wrote:
> > > >
> > > > >  Adam,
> > > > > I believe there is a need for more detailed ProvenanceEvents.
> > > > > A use case would be a customer that is trying to track data passed
> > > > between
> > > > > two nifi's and trying to match up SENDs and RECEIVEs
> > > > >
> > > > > So a flowfile that has a SEND event on the first nifi should have
a
> > > > > RECEIVE event on the second nifi.
> > > > > But a flowfile that was PULLed by the second nifi (from the first
> > nifi)
> > > > > will not necessarily have any provenance event generated by the
> first
> > > > nifi.
> > > > >
> > > > > (I realize that FETCH is already a "reserved word" in the current
> > > > > ProvenanceEvents setup, so I was hoping PULL could be used
> instead.)
> > > > > There is another Provenance Event, ACKNOWLEDGE, which would also
> fit
> > > > > occasionally to this model as well (an example would be
> > > > HandleHttpResponse
> > > > > processor which could send this instead of SEND when responding to
> a
> > > HTTP
> > > > > request)
> > > > > This being said, you make an excellent point when you said
> > > > > "However even more important to realize,
> > > > > this change would affect many other downstream consumers of
> > provenance
> > > > data
> > > > > which aren't necessarily in the stock NiFi distribution."
> > > > > Thanks,
> > > > > Nissim
> > > > >
> > > > >    On Friday, October 11, 2019, 11:30:19 AM EDT, Nissim Shiman
> > > > > <nshiman@yahoo.com.invalid> wrote:
> > > > >
> > > > >  Adam,
> > > > > "Yes" to your first question and the four processor examples you
> > > listed.
> > > > >
> > > > > I will need to get back to you regarding your other points.
> > > > >
> > > > > Thanks,
> > > > > Nissim
> > > > >
> > > > >    On Thursday, October 10, 2019, 7:05:57 PM EDT, Adam Taft <
> > > > > adam@adamtaft.com> wrote:
> > > > >
> > > > >  Nissim,
> > > > >
> > > > > Just to be clear, you are trying to distinguish between processors
> > > which
> > > > > are actively "pulling" data (GetXYZ) vs. processors which just
> > "listen"
> > > > for
> > > > > data (ListenXYZ)?  Is that your basic vision?
> > > > >
> > > > > GetFile => PULL
> > > > > GetHTTP => PULL
> > > > > ListenHTTP => RECEIVE
> > > > > ListenTCP => RECEIVE
> > > > >
> > > > > Could you clarify what advantages this would have in terms of data
> > > > > provenance?  What would you use this new event type for
> specifically?
> > > > What
> > > > > are you missing now? Do you have a use case that needs this, or are
> > you
> > > > > just generally trying to round out the provenance event types for
> > sake
> > > of
> > > > > completeness?  I honestly don't know a use case where you care
> > whether
> > > > you
> > > > > polled for the data or listened for it.  The provenance model today
> > > just
> > > > > cares that you received the data, not so much how you received it.
> > > > >
> > > > > You're right that this proposal will affect many processors and the
> > > > > internal visualization tools, etc.  However even more important to
> > > > realize,
> > > > > this change would affect many other downstream consumers of
> > provenance
> > > > data
> > > > > which aren't necessarily in the stock NiFi distribution.  For
> > example,
> > > > any
> > > > > third-party/custom ReportingTask that handles provenance data would
> > > need
> > > > to
> > > > > be updated with this change.  There's probably need for a strong
> > vision
> > > > to
> > > > > help demonstrate the value for this vs. the cost of the cascading
> > > effects
> > > > > related to this change.
> > > > >
> > > > > Thanks,
> > > > > Adam
> > > > >
> > > > >
> > > > > On Thu, Oct 10, 2019 at 4:02 PM Nissim Shiman
> > > <nshiman@yahoo.com.invalid
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hello Team,
> > > > > >
> > > > > > The ProvenanceEventType class does a good job capturing possible
> > > > events,
> > > > > > but the PULL event doesn't seem to fall nicely into any of the
> > > existing
> > > > > > types.
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/provenance/ProvenanceEventType.java
> > > > > > RECEIVE is the closest, but RECEIVE is passive and doesn't
> capture
> > > the
> > > > > > active action of a PULL
> > > > > >
> > > > > > And... maybe it would fall into FETCH, but FETCH is more focused
> on
> > > > > > contents of an existing flow file being overwritten.
> > > > > >
> > > > > > What does the community think about a new PULL event type,
> > > > > > or
> > > > > >  using FETCH for PULL, and having what FETCH does now be a new
> > event
> > > > such
> > > > > > as REUSE
> > > > > >
> > > > > > NOTE: a new PULL event would have a cascading effect of many
> > > processors
> > > > > > that currently are emitting RECEIVE's being modified to be PULL
> > > > > > (i.e. So GetFile would no longer be a RECEIVE, but rather a
> PULL),
> > > but
> > > > > > would more accurately capture the event.
> > > > > >
> > > > > > Thanks,
> > > > > > Nissim Shiman
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > |
> > |
> > |
> > |  |  |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > apache/nifi
> >
> > Mirror of Apache NiFi. Contribute to apache/nifi development by creating
> > an account on GitHub.
> >  |
> >
> >  |
> >
> >  |
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message