nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bimal Mehta <bimal...@gmail.com>
Subject Data Ingestion using NiFi
Date Tue, 13 Aug 2019 04:24:46 GMT
Hi NiFi users,

We had been using the kylo data ingest template to read the data from our
Oracle and DB2 databases and move it into HDFS and Hive.
The kylo data ingest template also provided some features to validate,
profile and split the data based on validation rules. We also built some
custom processors and added them to the template.
We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors
don't work there. We were able to make our custom processors work in 1.9.0
but the kylo nar files don't work. I don't know if any work around exists
for that.

However given that the kylo project is dead, I don't want to depend on
those kylo-nar files and processors, what I wanted to understand is how do
I replicate that functionality using the standard processors available in
NiFi.

Essentially are there processors that allow me to do the below:
1. Read data from database - I know QueryDatabaseTable. Any other? How do I
make it parameterized so that I don't need to create one flow for one
table. How can we pass the table name while running the job?
2. Partition and convert to avro- I know splitavro, but does it partition
also, and how do I pass the partition parameters
3. Write data to HDFS and Hive- I know PutHDFS works for writing to HDFS,
but should I use PutSQL for Hive by converting the avro in step 2 to SQL?
Or is there a better option. Does this support upserts as well?
4. Apply validation rules to the data before being written into Hive. Like
calling a custom spark job that will execute the validation rules and split
the data. Any processor that can help achieve this?

I know a few users in this group had used kylo on top of NiFi. It will be
great if some of you can provide your perspective as well.

Thanks in advance.

Bimal Mehta

Mime
View raw message