nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From M√°rcio Faria <>
Subject Push x Pull ETL
Date Tue, 11 Oct 2016 21:17:53 GMT

Potential NiFi user here.

I'm trying to figure out if NiFi could be a good choice to replace our existent homemade ETL
system, which roughly works like this:

1) Either on demand or at periodic instants, fetch fresh rows from one or more tables in the
source database and insert or update them into the destination database;

2) Run the jobs which depend on the more recent data, and generate files based on those;

3) Upload the generated files to an external server using HTTPS.

Since our use cases are more of a "pull" style (Ex: It's time to run the report -> get
the required data updated -> run the processing job and submit the results) than "push"
(Ex: Get the latest data available -> when some condition is met, run the processing job
and submit the results), I'm wondering if NiFi, or any other flow-based toolset for that matter,
would be a good option for us to try or not. Your opinion? Suggestions?

Besides, what is the recommended way to handle errors in a ETL scenario like that? For example,
we submit a "page" of rows to a remote server and its response tells us which of those rows
were accepted and which ones had a validation error. What would be the recommended approach
to handle such errors if the fix requires some human intervention? Is there a way of stopping
the whole flow until the correction is done? How to restart it when part of the data were
already processed by some of the processors? The server won't accept a transaction B if it
depends on a transaction A that wasn't successfully submitted before.

As you see, our processing is very batch-oriented. I know NiFi can fetch data in chunks from
a relational database, but I'm not sure how to approach the conversion from our current style
to a more "stream"-oriented one. I'm afraid I could try to use the "right tool for the wrong
problem", if you know what I mean.
Apologies if this is not the proper venue to ask. I checked all the posts in this mailing
list and also tried to search for information elsewhere, but I wasn't able to find the answers

Any guidance, like examples or links to further reading, would be very much appreciated. I'm
just starting to learn the ropes.

Thank you,

View raw message