nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: Question about NiFi and bulk database inserts - is PutSQL the only out of the box option?
Date Thu, 09 Aug 2018 15:03:50 GMT
Bob,

Unless you already have SQL in your flow files, I always recommend
PutDatabaseRecord [1] over PutSQL. The former is basically a mashup of
ConvertJSONToSQL -> PutSQL, but takes in data in any format supported by
our record readers (CSV, Avro, XML, JSON, etc.) and takes care of all the
SQL generation (and prepared statement stuff) under the hood. You should
find it a lot easier to work with, and a lot faster than the older
alternative, especially since PutDatabaseRecord is able to deal with an
entire set of rows/records in one flow file, rather than having to split up
large CSV files, e.g. into individual rows to get individual SQL
statements. If you try it out, please let us know if you run into any
issues, I will do my best to help get you up and going.

Regards,
Matt

[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.7.1/org.apache.nifi.processors.standard.PutDatabaseRecord/index.html


On Thu, Aug 9, 2018 at 10:52 AM Kuhfahl, Bob <rkuhfahl@mitre.org> wrote:

> I’m trying to get bulk inserts going using PutSQL processor but it’s
> starting to get ugly so I need to reach out and see if any of you have been
> down this path.
>
>
>
> If you have, here’s some info.  If not, thanks for reading this far ☺
>
>
>
> Background:
>
> Legacy database migration ETL task.  Extract from one database, do a bunch
> of transformations, then load it all into a postgresql repo.
>
> We have 100’s of tables with obviously many record structures *_and a ton
> of data_.*
>
>
>
> According to:
>
>
> https://community.hortonworks.com/articles/91849/design-nifi-flow-for-using-putsql-processor-to-per.html
>
>
>
> PutSQL, to do batch inserts, seems to want the form of the SQL statement
> to be identical for each record type.
>
> e.g. Insert into Employee ("name", "job title") VALUES (?,?)
>
>
>
> Easy enough to build that *but* then it needs attributes for all the
> values and types in the flow.
>
> e.g.
>
> 1.  sql.args.1.value = Bryan B
>
> 2.  sql.args.2.value = Director
>
> Use Update Attribute Processor to set sql.args.N.type Flow file attributes
>
> 1.  sql.args.1.type = 12 (VARCHAR)
>
> 2.  sql.args.2.type = 12
>
>
>
> THIS implies my flow will need to create a couple attributes for every
> single field in the dataflow – AND I’ll have to come up with logic to
> determine what the data type is…
>
>
>
> I’m a newbie at this nifi stuff but that really does _not_ feel like I’m
> going down a good path.
>
> I’m hand-jamming a proof of concept just to validate the above, but having
> a hard time lining up the data types… (e.g. the database has a char(2)
> field; trying char, trying varchar, …)
>
>
>
> The other SQL “insert-able” processors seem to want to read a file instead
> of a flow, but I could easily be missing something.
>
> Suggestions would be appreciated!
>
>
>

Mime
View raw message