nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <>
Subject Re: Data Load
Date Tue, 02 May 2017 16:07:31 GMT
Yes that sounds like your best bet, assuming you have the "Maximum
Value Column" present in the table you want to migrate.  Then a flow
might look like:

QueryDatabaseTable -> ConvertAvroToJSON -> ConvertJSONToSQL -> PutSQL

In this flow the target tables would need to be created beforehand.
You might be able to do that with pg_dump or with some fancy SQL that
you could send to PutSQL in a separate (do-ahead) flow [1].  For
multiple tables, you will need one QueryDatabaseTable for each table;
depending on the number of tables and the latency for getting/putting
rows, you may be able to share the downstream processing. If that
creates a bottleneck, you may want a copy of the above flow for each
table.  This is drastically improved in NiFi 1.2.0, as you can use
ListDatabaseTables -> GenerateTableFetch -> RPG -> Input Port ->
ExecuteSQL to perform the migration in parallel across a NiFi cluster.



On Tue, May 2, 2017 at 11:18 AM, Anil Rai <> wrote:
> Thanks Matt for the quick reply. We are using nifi 1.0 release as of now.
> It's a postgres DB on both sides (on prem and on cloud)
> and yes incremental load is what i am looking for.....
> so with that, you recommend # 2 option?
> On Tue, May 2, 2017 at 11:00 AM, Matt Burgess <> wrote:
>> Anil,
>> Is this a "one-time" migration, meaning you would take the on-prem
>> tables and put them on the cloud DB just once? Or would this be an
>> incremental operation, where you do the initial move and then take any
>> "new" rows from the source and apply them to the target?  For the
>> latter, there are a couple of options:
>> 1) Rebuild the cloud DB periodically. You can use ExecuteSQL ->
>> [processors] -> PutSQL after perhaps deleting your target
>> DB/tables/etc.  This could be time-consuming and expensive. The
>> processors in-between probably include ConvertAvroToJSON and
>> ConvertJSONToSQL.
>> 2) Use QueryDatabaseTable or (GenerateTableFetch -> ExecuteSQL) to get
>> the source data. For this your table would need a column whose values
>> always increase, that column would comprise the value of the "Maximum
>> Value Column" property in the aforementioned processors' configuration
>> dialogs. You would need one QueryDatabaseTable or GenerateTableFetch
>> for each table in your DB.
>> In addition to these current solutions, as of the upcoming NiFi 1.2.0
>> release, you have the following options:
>> 3) If the source database is MySQL, you can use the CaptureChangeMySQL
>> processor to get binary log events flowing through various processors
>> into PutDatabaseRecord to place them at the source. This pattern is
>> true Change Data Capture (CDC) versus the other two options above.
>> 4) Option #2 will be improved such that GenerateTableFetch will accept
>> incoming flow files, so you can use ListDatabaseTables ->
>> GenerateTableFetch -> ExecuteSQL to handle multiple tables with one
>> flow.
>> If this is a one-time migration, a data flow tool might not be the
>> best choice, you could consider something like Flyway [1] instead.
>> Regards,
>> Matt
>> [1]
>> On Tue, May 2, 2017 at 10:41 AM, Anil Rai <> wrote:
>> > I have a simple use case.
>> >
>> > DB (On Premise) and DB (On Cloud).
>> >
>> > I want to use nifi to extract data from on prem DB (huge volumes) and
>> > insert into the same table structure that is hosted on cloud.
>> >
>> > I could use ExecuteSQL on both sides of the fence (to extract from on
>> prem
>> > and insert onto cloud). What processors are needed in between (if at
>> all)?
>> > As i am not doing any transformations at is just extract and
>> load
>> > use case

View raw message