nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bimal Mehta <bimal...@gmail.com>
Subject Re: Data Ingestion using NiFi
Date Tue, 13 Aug 2019 15:45:33 GMT
Thanks Mike.
ExecuteSQL looks good and am trying it.

Also I wanted to understand how can we control triggering the NiFi jobs
from devops tools like CloudBees/ElectricFlow?

On Tue, Aug 13, 2019 at 7:35 AM Mike Thomsen <mikerthomsen@gmail.com> wrote:

> Bimal,
>
> 1. Take a look at ExecuteSQLRecord and see if that works for you. I don't
> use SQL databases that much, but it works like a charm for me and others
> for querying and getting an inferred avro schema based on the schema of the
> database table (you can massage it into another format with ConvertRecord).
> 2. Take a look at QueryRecord and PartitionRecord with them configured to
> use Avro readers and writers.
>
> Mike
>
> On Tue, Aug 13, 2019 at 12:25 AM Bimal Mehta <bimal007@gmail.com> wrote:
>
>> Hi NiFi users,
>>
>> We had been using the kylo data ingest template to read the data from our
>> Oracle and DB2 databases and move it into HDFS and Hive.
>> The kylo data ingest template also provided some features to validate,
>> profile and split the data based on validation rules. We also built some
>> custom processors and added them to the template.
>> We recently migrated to NiFi 1.9.0 (CDF), and a lot of Kylo processors
>> don't work there. We were able to make our custom processors work in 1.9.0
>> but the kylo nar files don't work. I don't know if any work around exists
>> for that.
>>
>> However given that the kylo project is dead, I don't want to depend on
>> those kylo-nar files and processors, what I wanted to understand is how do
>> I replicate that functionality using the standard processors available in
>> NiFi.
>>
>> Essentially are there processors that allow me to do the below:
>> 1. Read data from database - I know QueryDatabaseTable. Any other? How do
>> I make it parameterized so that I don't need to create one flow for one
>> table. How can we pass the table name while running the job?
>> 2. Partition and convert to avro- I know splitavro, but does it partition
>> also, and how do I pass the partition parameters
>> 3. Write data to HDFS and Hive- I know PutHDFS works for writing to HDFS,
>> but should I use PutSQL for Hive by converting the avro in step 2 to SQL?
>> Or is there a better option. Does this support upserts as well?
>> 4. Apply validation rules to the data before being written into Hive.
>> Like calling a custom spark job that will execute the validation rules and
>> split the data. Any processor that can help achieve this?
>>
>> I know a few users in this group had used kylo on top of NiFi. It will be
>> great if some of you can provide your perspective as well.
>>
>> Thanks in advance.
>>
>> Bimal Mehta
>>
>

Mime
View raw message