nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: AVRO is the only output format with ExecuteSQL
Date Tue, 07 Aug 2018 13:28:07 GMT
I'm definitely interested in supporting a record-aware version as well
(I wrote the Jira up last year [1] but haven't gotten around to
implementing it), however I agree with Peter's comment on the Jira.
Since ExecuteSQL is an oft-touched processor, if we had two processors
that only differed in how the output is formatted, it could be harder
to maintain (bugs to be fixed in two places, e.g.). I think we should
add an optional RecordWriter property to ExecuteSQL, and the
documentation would reflect that if it is not set, the output will be
Avro with embedded schema as it has always been. If the RecordWriter
is set, either the schema can be hardcoded, or they can use "Inherit
Record Schema" even though there's no reader, and that would mimic the
current behavior where the schema is inferred from the database
columns and used for the writer. There is precedence for this pattern
in the SiteToSite reporting tasks.

To Bryan's point about history, Avro at the time was the most
descriptive of the solutions because it maintains the schema and
datatypes with the data, unlike JSON, CSV, etc. Also before the record
readers/writers, as Bryan said, you pretty much had to split,
transform, merge. We just need to make that processor (and others with
specific input/output formats) "record-aware" for better performance.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-4517
On Tue, Aug 7, 2018 at 9:20 AM Bryan Bende <bbende@gmail.com> wrote:
>
> I would also add that the pattern of splitting to 1 record per flow
> file was common before the record processors existed, and generally
> this can/should be avoided now in favor of processing/manipulating
> records in place, and keeping them together in large batches.
>
>
>
> On Tue, Aug 7, 2018 at 9:10 AM, Andrew Grande <aperepel@gmail.com> wrote:
> > Careful, that makes too much sense, Joe ;)
> >
> >
> > On Tue, Aug 7, 2018, 8:45 AM Joe Witt <joe.witt@gmail.com> wrote:
> >>
> >> i think we just need to make an ExecuteSqlRecord processor.
> >>
> >> thanks
> >>
> >> On Tue, Aug 7, 2018, 8:41 AM Mike Thomsen <mikerthomsen@gmail.com> wrote:
> >>>
> >>> My guess is that it is due to the fact that Avro is the only record type
> >>> that can match sql pretty closely feature to feature on data types.
> >>> On Tue, Aug 7, 2018 at 8:33 AM Boris Tyukin <boris@boristyukin.com>
> >>> wrote:
> >>>>
> >>>> I've been wondering since I started learning NiFi why ExecuteSQL
> >>>> processor only returns AVRO formatted data. All community examples I've
seen
> >>>> then convert AVRO to json and pretty much all of them then split json
to
> >>>> multiple flows.
> >>>>
> >>>> I found myself doing the same thing over and over and over again.
> >>>>
> >>>> Since everyone is doing it, is there a strong reason why AVRO is liked
> >>>> so much? And why everyone continues doing this 3 step pattern rather
than
> >>>> providing users with an option to output json instead and another option
to
> >>>> output one flowfile or multiple (one per record).
> >>>>
> >>>> thanks
> >>>> Boris

Mime
View raw message