nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Re: Adding schema inference
Date Thu, 23 May 2019 22:29:06 GMT
I think tabling that feature for this processor is probably the right way
to go for now. It really should be part of a larger upgrade to the Record
API because just jamming in new fields without warning here could be very
unwelcomed by some users such as my current project team (we have very hard
controls on schemas compared to most of my other engagements).

(FWIW, we're actually using this processor on real data)

Thanks,

Mike

On Thu, May 23, 2019 at 6:25 PM Matt Burgess <mattyb149@apache.org> wrote:

> I think it'd be great to create one-field records for each level that
> doesn't exist, for things like LookupRecord and/or UpdateRecord (from
> the aforementioned Jira) I had envisioned the same thing. It would
> make it easier for other such processors to enrich/enlarge a
> newly-existing record level, basically an Upsert pattern (or mkdir -p
> :)
>
> Regards,
> Matt
>
> On Tue, May 21, 2019 at 2:32 PM Mike Thomsen <mikerthomsen@gmail.com>
> wrote:
> >
> > All things considered, it doesn't seem like that much work. My main
> concern
> > is how to handle it if a user does something like nest the data deep into
> > multiple schema levels that don't exist. How should that be handled? An
> > exception or generating empty record declarations that have one field?
> >
> > On Tue, May 21, 2019 at 2:07 PM Matt Burgess <mattyb149@apache.org>
> wrote:
> >
> > > Mike,
> > >
> > > Check AbstractRouteRecord, it uses the read-first-get-schema-read-rest
> > > pattern. However for that snippet, it is the RecordReader that is
> > > possibly updating the schema (currently the only thing that does this
> > > is schema inference), then the RecordSetWriter is created using the
> > > (possibly updated) schema. For your PR, you might need to update the
> > > schema manually and then pass that into the
> > > RecordSetWriterFactory.getSchema() call.
> > >
> > > Now that I think of it, you might be the first to do this
> > > automatic-schema-update-on-write thing (if you choose to do so). I
> > > thought NIFI-5524 [1] had been implemented but apparently not.
> > >
> > > Regards,
> > > Matt
> > >
> > > [1] https://issues.apache.org/jira/browse/NIFI-5524
> > >
> > >
> > > On Tue, May 21, 2019 at 10:08 AM Mike Thomsen <mikerthomsen@gmail.com>
> > > wrote:
> > > >
> > > > Matt left this suggestion:
> > > >
> > > > https://github.com/apache/nifi/pull/3231#discussion_r285693972
> > > >
> > > > What would be a good example of that pattern if I wanted to update
> that
> > > PR
> > > > and document the process for others?
> > > >
> > > > Thanks,
> > > >
> > > > Mike
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message