nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: AVRO is the only output format with ExecuteSQL
Date Tue, 07 Aug 2018 16:37:29 GMT
thanks for all the responses! it means I am not the only one interested in
this topic.

Record-aware version would be really nice, but a lot of times I do not want
to use record-based processors since I need to define a schema for
input/output upfront and just want to run SQL query and get whatever
results back. It just adds an extra step that will be subject to
break/support.

Similar to Kafka processors, it is nice to have an option of record-based
processor vs. message oriented processor. But if one processor can do it
all, it is even better :)


On Tue, Aug 7, 2018 at 9:28 AM Matt Burgess <mattyb149@apache.org> wrote:

> I'm definitely interested in supporting a record-aware version as well
> (I wrote the Jira up last year [1] but haven't gotten around to
> implementing it), however I agree with Peter's comment on the Jira.
> Since ExecuteSQL is an oft-touched processor, if we had two processors
> that only differed in how the output is formatted, it could be harder
> to maintain (bugs to be fixed in two places, e.g.). I think we should
> add an optional RecordWriter property to ExecuteSQL, and the
> documentation would reflect that if it is not set, the output will be
> Avro with embedded schema as it has always been. If the RecordWriter
> is set, either the schema can be hardcoded, or they can use "Inherit
> Record Schema" even though there's no reader, and that would mimic the
> current behavior where the schema is inferred from the database
> columns and used for the writer. There is precedence for this pattern
> in the SiteToSite reporting tasks.
>
> To Bryan's point about history, Avro at the time was the most
> descriptive of the solutions because it maintains the schema and
> datatypes with the data, unlike JSON, CSV, etc. Also before the record
> readers/writers, as Bryan said, you pretty much had to split,
> transform, merge. We just need to make that processor (and others with
> specific input/output formats) "record-aware" for better performance.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-4517
> On Tue, Aug 7, 2018 at 9:20 AM Bryan Bende <bbende@gmail.com> wrote:
> >
> > I would also add that the pattern of splitting to 1 record per flow
> > file was common before the record processors existed, and generally
> > this can/should be avoided now in favor of processing/manipulating
> > records in place, and keeping them together in large batches.
> >
> >
> >
> > On Tue, Aug 7, 2018 at 9:10 AM, Andrew Grande <aperepel@gmail.com>
> wrote:
> > > Careful, that makes too much sense, Joe ;)
> > >
> > >
> > > On Tue, Aug 7, 2018, 8:45 AM Joe Witt <joe.witt@gmail.com> wrote:
> > >>
> > >> i think we just need to make an ExecuteSqlRecord processor.
> > >>
> > >> thanks
> > >>
> > >> On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen <mikerthomsen@gmail.com>
> wrote:
> > >>>
> > >>> My guess is that it is due to the fact that Avro is the only record
> type
> > >>> that can match sql pretty closely feature to feature on data types.
> > >>> On Tue, Aug 7, 2018 at 8:33 AM Boris Tyukin <boris@boristyukin.com>
> > >>> wrote:
> > >>>>
> > >>>> I've been wondering since I started learning NiFi why ExecuteSQL
> > >>>> processor only returns AVRO formatted data. All community examples
> I've seen
> > >>>> then convert AVRO to json and pretty much all of them then split
> json to
> > >>>> multiple flows.
> > >>>>
> > >>>> I found myself doing the same thing over and over and over again.
> > >>>>
> > >>>> Since everyone is doing it, is there a strong reason why AVRO is
> liked
> > >>>> so much? And why everyone continues doing this 3 step pattern
> rather than
> > >>>> providing users with an option to output json instead and another
> option to
> > >>>> output one flowfile or multiple (one per record).
> > >>>>
> > >>>> thanks
> > >>>> Boris
>

Mime
View raw message