nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Re: Data Ingestion using NiFi
Date Tue, 13 Aug 2019 11:35:03 GMT
Bimal,

1. Take a look at ExecuteSQLRecord and see if that works for you. I don't
use SQL databases that much, but it works like a charm for me and others
for querying and getting an inferred avro schema based on the schema of the
database table (you can massage it into another format with ConvertRecord).
2. Take a look at QueryRecord and PartitionRecord with them configured to
use Avro readers and writers.

Mike

On Tue, Aug 13, 2019 at 12:25 AM Bimal Mehta <bimal007@gmail.com> wrote:

> Hi NiFi users,
>
> We had been using the kylo data ingest template to read the data from our
> Oracle and DB2 databases and move it into HDFS and Hive.
> The kylo data ingest template also provided some features to validate,
> profile and split the data based on validation rules. We also built some
> custom processors and added them to the template.
> We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors
> don't work there. We were able to make our custom processors work in 1.9.0
> but the kylo nar files don't work. I don't know if any work around exists
> for that.
>
> However given that the kylo project is dead, I don't want to depend on
> those kylo-nar files and processors, what I wanted to understand is how do
> I replicate that functionality using the standard processors available in
> NiFi.
>
> Essentially are there processors that allow me to do the below:
> 1. Read data from database - I know QueryDatabaseTable. Any other? How do
> I make it parameterized so that I don't need to create one flow for one
> table. How can we pass the table name while running the job?
> 2. Partition and convert to avro- I know splitavro, but does it partition
> also, and how do I pass the partition parameters
> 3. Write data to HDFS and Hive- I know PutHDFS works for writing to HDFS,
> but should I use PutSQL for Hive by converting the avro in step 2 to SQL?
> Or is there a better option. Does this support upserts as well?
> 4. Apply validation rules to the data before being written into Hive. Like
> calling a custom spark job that will execute the validation rules and split
> the data. Any processor that can help achieve this?
>
> I know a few users in this group had used kylo on top of NiFi. It will be
> great if some of you can provide your perspective as well.
>
> Thanks in advance.
>
> Bimal Mehta
>

Mime
View raw message