nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@apache.org>
Subject Re: Attribute level interlinked CSV file import
Date Mon, 26 Feb 2018 13:30:34 GMT
Mausam,

You could use PutFile to store off the Category CSV, then you can use
LookupRecord with either a CSVRecordLookupService or a
SimpleCsvLookupService, the former is for fetching multiple fields
from the lookup, the latter is for a single value lookup. You'll also
use a CSVReader to read in the data, and a CSVRecordSetWriter (or some
other writer if you are converting the format).

For the input format, if they are all strings you can configure the
reader to "Use String Fields From Header", but that assumes a header
line and that all fields are of String type. If the fields are of
primitive types (String, int, float) you can use InferAvroSchema first
to get the schema into the "avro.schema" attribute, then configure the
reader to "Use Schema Text" as the access strategy and ${avro.schema}
as the Schema Text property.

For the writer, you need to provide the adjusted schema (with added
output fields from the Category CSV), so you can't use "Inherit Record
Schema" for the access strategy in the writer. Alternatively, I
suggest you explicitly create the correct input and output schemas,
you can either paste them directly into the "Schema Text" property for
the reader and writer, or set up an AvroSchemaRegistry, name the
schemas as user-defined properties (see the documentation for more
details), then you can "Use Schema Name" as the access strategy, then
use the name from the registry in the Schema Name property.

If storing the CSV as a file is not prudent, you can (currently) use
MongoDB to persist it and use MongoDBLookupService, the same goes for
HBase. In the future I hope we have a RDBMSLookupService to look up
records from an RDBMS table, and possibly a Redis-backed one or
anything the community would like to contribute :)

Regards,
Matt


On Mon, Feb 26, 2018 at 5:04 AM, mausam <mausam4u@gmail.com> wrote:
> Hi,
>
> I am trying to use Nifi+Kafka to import multiple CSV files into my
> application.
>
> Problem statement:
> Some of these CSV files are interlinked on attribute level.
>
> For example, the product CSV has a reference to Category CSV.
> Or, the Price CSV has a reference to Product CSV.
>
> Also, it is possible, that the Category CSV comes only once in the beginning
> and subsequently, product CSV comes for months. In such case, I need to
> store the Category CSV data in Nifi for future references.
>
> I am able to create a flow with all independent files but am not able to
> solve the file interlinking.
>
> Queries:
>
> 1. Is there any out of the box processor that can help implement this?
> 2. Do I need to use a DB (like mongodb) to persist data for future
> reference?
>
> Thanks in advance.
>
> -Mausam
>
>
>
>

Mime
View raw message