We had been using the kylo data ingest template to read the data from our Oracle and DB2 databases and move it into HDFS and Hive.
The kylo data ingest template also provided some features to validate, profile and split the data based on validation rules. We also built some custom processors and added them to the template.
We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors don't work there. We were able to make our custom processors work in 1.9.0 but the kylo nar files don't work. I don't know if any work around exists for that.
However given that the kylo project is dead, I don't want to depend on those kylo-nar files and processors, what I wanted to understand is how do I replicate that functionality using the standard processors available in NiFi.
Essentially are there processors that allow me to do the below:
1. Read data from database - I know QueryDatabaseTable. Any other? How do I make it parameterized so that I don't need to create one flow for one table. How can we pass the table name while running the job?
2. Partition and convert to avro- I know splitavro, but does it partition also, and how do I pass the partition parameters
3. Write data to HDFS and Hive- I know PutHDFS works for writing to HDFS, but should I use PutSQL for Hive by converting the avro in step 2 to SQL? Or is there a better option. Does this support upserts as well?
4. Apply validation rules to the data before being written into Hive. Like calling a custom spark job that will execute the validation rules and split the data. Any processor that can help achieve this?
I know a few users in this group had used kylo on top of NiFi. It will be great if some of you can provide your perspective as well.
Thanks in advance.