nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Coding a Processor that writes to multiple output flowfiles at once
Date Mon, 23 Nov 2015 14:15:02 GMT
Hi Salvatore,

Have you looked at the append() method on ProcessSession which lets you
append to the content of a FlowFile?

You should be able to create several new FlowFiles, and then while reading
lines from the incoming FlowFile, append the appropriate parts to each of
the new FlowFIles.

An example processor that does something like this is the new RouteText
processor:
https://github.com/apache/nifi/blob/773576e041088d9e326f1d2e84b0ad8acbd6cfdc/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RouteText.java#L485

Let us know if this helps.

Thanks,

Bryan


On Mon, Nov 23, 2015 at 2:40 AM, Salvatore Papa <salvatore.papa@gmail.com>
wrote:

> Heya NiFi devs,
>
> I'm having a bit of trouble trying to wrap my head around a valid way of
> tackling this problem with the available Processor templates. I'd like to
> split an input flowfile into N different flowfiles, 1 going into 1 of N
> relationships.
>
> A simplistic way of viewing it would be: A very large CSV file, with N
> columns, and I want to split each column into its own flowfile, and each of
> these flowfiles to its own relationship (or with an attribute saying which
> column it belongs to).
>
> Basic premise is for an example with two columns, and only two lines:
> * Read a line, write first column value to flowfile A, write second column
> value to flowfile B
> * Read next line, appending first column value to flowfile A, appending
> second column value to flowfile B
> Followed by one of:
> * Send flowfile A to relationship A, and send flowfile B to relationship B
> or
> * Set attribute "A" to flowfile A, attribute "B" to flowfile B, then send
> both A and B to a 'success' relationship.
>
> Unfortunately, I can't seem to find a way to write to multiple flowfiles at
> once, or at least, write to an outputstream for one flowfile, then write to
> another outputstream for another flowfile, then continue writing to the
> first flowfile.
>
> If they weren't such large files, i'd be okay with reading the input file N
> times, pulling out the different part each time, but i'd like to only have
> to read each line (by extension, the file) only once.
>
> I've written AbstractProcessors before for simple One-to-One
> transformations, and even Merge processors which use are an extension of
> AbstractSessionFactoryProcessors to do Many-to-One, and even Split
> AbstractProcessors for One-to-Many in serial (splitting at different
> places, even clone(flowfile, start, size); But I can't work out a way to do
> this One-to-Many in parallel.
>
> Any ideas? Am I missing something useful? Do I just have to do it reading
> it multiple times? Just a really simple proof of concept explaining the
> design would be enough to get me started.
>
> Kind regards,
> Salvatore
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message