nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: Merging Records
Date Mon, 12 Jun 2017 20:02:40 GMT
Mika,

Are you receiving the log messages using the ListenTCP processor?

If so, just wanted to mention that there is a property "Max Batch
Size" that defaults to 1 and will control how many logical TCP
messages can be written to a single flow file.

If you increase that to say 1000, then you can send a flow file with
1000 log messages to the next record-based processor with the
GrokReader.

-Bryan


On Mon, Jun 12, 2017 at 3:51 PM, Mark Payne <markap14@hotmail.com> wrote:
> Mika,
>
> Understood. The JIRA for this is NIFI-4060 [1]. MergeContent is likely the
> best option for the short-term,
> merging with a demarcator of \n (you can press Shift + Enter/Return to
> insert a new-line in the UI), if that
> works for your format.
>
> Thanks
> -Mark
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-4060
>
>
> On Jun 12, 2017, at 3:36 PM, Mika Borner <nifi@my2ndhead.com> wrote:
>
> Hi Mark
>
> Yes, this makes sense.
>
> In my case. I'm receiving single log events from a tcp input which I would
> like to process further with record processors. This is  probably an edge
> case where a record merger would make sense to make the post-processing more
> efficient.
>
> Good to hear it's already on the radar :-)
>
> Mika>
>
>
>
> On 06/12/2017 09:23 PM, Mark Payne wrote:
>
> Hi Mika,
>
> You're correct that there is not yet a MergeRecord processor. It is on my
> personal radar,
> but I've not yet gotten to it. One of the main reasons that I've not
> prioritized this yet is that
> typically in this record-oriented paradigm, you'll see data coming in, in
> groups and being
> processed in groups. MergeContent largely has been useful in cases where we
> split data
> apart (using processors like SplitText, for example), and then merge it back
> together later.
> I don't see this as being quite as prominent when using record readers and
> writers, as the
> readers are designed to handle streams of data instead of individual records
> as FlowFiles.
>
> That being said, there are certainly cases where MergeRecord still makes
> sense. For example,
> when you're ingesting small payloads or want to batch up to send to
> something like HDFS, which
> prefers larger files, etc. So I'll hopefully have a chance to start working
> on that this week or next.
>
> In the mean time, the best path forward for you may be to use MergeContent
> to concatenate a bunch
> of data before the processor that is using the Grok Reader. Or, if you are
> splitting the data up
> into individual records yourself, I would recommend not splitting them up at
> all.
>
> Does this make sense?
>
> Thanks
> -Mark
>
>
> On Jun 12, 2017, at 3:12 PM, Mika Borner <nifi@my2ndhead.com> wrote:
>
> Hi,
>
> what is the best way to merge records? I'm using a GrokReader, that spits
> out single json records. For efficiency I would like to merge a few hundred
> records into one flowfile. It seems there's no MergeRecord processor yet...
>
> Thanks!
>
> Mika>
>
>
>

Mime
View raw message