nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Wicks (pwicks)" <pwi...@micron.com>
Subject RE: [EXT] NiFi 1.3: Simplest way possible of creating CSV files from SQL queries
Date Tue, 01 Aug 2017 08:19:59 GMT
I hate to respond with “me too”, but I haven’t seen a response and this kind of simplification
is of interest to me.

The PutDatabaseRecord processor already does something similar, and I have only needed the
AvroReader processor without a schema registry.


From: Márcio Faria [mailto:faria.marcio@ymail.com]
Sent: Tuesday, July 25, 2017 11:09 AM
To: Users <users@nifi.apache.org>
Subject: [EXT] NiFi 1.3: Simplest way possible of creating CSV files from SQL queries

Hi,

I'm looking for the simplest way possible of creating CSV files from SQL queries using Apache
NiFi 1.3.

The flow I currently have (the files are to be SFTP'ed to a remote server):

ExecuteSQL -> UpdateAttribute -> ConversionRecord [3 CSs] -> PutSFTP

The concept of SchemaRegistry is new to me, but if I understood it correctly in order for
the ConversionRecord to work properly is necessary to have 3 Controller Services ([3 CSs])
associated with it:

  *   AvroSchemaRegistry, with the schema defined in Avro Schema (JSON);
  *   AvroReader, referring to the above schema;
  *   CSVRecordSetWriter, also referring to the same schema.

It seems there are many benefits in using the schema registry, including versioning, validation,
etc, but in my example a simpler configuration would be welcome.

Isn't the schema already defined by ExecuteSQL? Can I have the ConversionRecord alone with
no dedicated SchemaRegistry (property), AvroReader,(instance), or CSVRecordSetWriter (instance)?
Of course, we'd still need to specify the output is a CSV, so perhaps a shared CSVRecordSetWriter
that also gets its schema from the flow file would still be useful.

By the way, would the Schema Access Strategy named "Use Embedded Avro Schema" be part of a
simpler solution? How?

In the same vein, what about having the schema-name property optionally defined by the ExecuteSQL
itself, so we don't have to depend on the UpdateAttribute component?

In summary, I'm wondering if it's possible to have 3 (+ 1 generic) components instead of 6
per query:

ExecuteSQL -> ConversionRecord [CSVRecordSetWriter] -> PutSFTP

That would make a difference when defining multiple conversions from SQL to CSV, or other
equivalent flows.

In addition, consider that someone might want to have maximum flexibility, meaning that it
would be totally acceptable to change the query and get a different layout for the resulting
CSV file, without having to change any SchemaRegistry, Reader, or Writer.

I've found a few tickets out there covering a similar topic. In particular, [1] mentions the
difficulty with more complex Avro data types. But I don't see that being a blocker when the
data source is an old-fashioned SQL query.

Recommendations?

P.S.1 Maybe templates would save the effort, but since Controller Services are "global", I'm
still wondering if having too many parts would make it more difficult to manage lots of flows
than it could be.

P.S.2 Will my 1st flow have a good performance? I'm wondering if another advantage of using
SchemaRegistry etc is that it prevents the creation of too many records at once.

Thank you,

Marcio

[1] NIFI-1372 Create ConvertAvroToCSV<https://issues.apache.org/jira/browse/NIFI-1372>


[NIFI-1372] Create ConvertAvroToCSV - ASF JIRA



Mime
View raw message