nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre <andre-li...@fucs.org>
Subject Re: MergeContent: Correlation Attribute Name syntax for matching syslog events
Date Sun, 07 Feb 2016 23:09:56 GMT
Bryan,

> There is an attribute called "syslog.sender" which is the host that the
message was received from, the
> value is populated from the incoming connection in Java code, not from
anything in the syslog message.
> This should essentially be the host of the syslog server/forwarder.

Correct.I saw that when I was using your code as the basis for
ListenLumberjack. :-)

> There is an attribute called "syslog.hostname" which is the hostname in
the syslog message itself,
> which should be the host that produced that message and sent it to a
syslog server.

Saw that as well.

This is particularly handy when fitting NiFi into "existent syslog server"
scenarios.

Syslog Producers --> existing rsyslog / syslog-ng server --> log shipping
mechanism (e.g. Flume, filebeat, heka, MiNiFi)   --> NiFi

(Same can be said of the RouteText suggestion together with batching as
suggested below)


> By default ListenSyslog has parse set to true and batch size set to 1. If
you set parse to false
> and increase the batch size to say 100, it will try to grab a maximum of
100 messages in each
>  execution of the processor (could be less depending on timing and what
is available), and for
> those 100 messages it groups them by the "sender" (described above) and
outputs a flow file
> per sender.

I saw those options and while inclined to use I was wondering, what happens
to ordering in this case?

If I take the paper Rainer Gerhards (of rsyslog fame) wrote in 2010 (
http://www.gerhards.net/download/LinuxKongress2010rsyslog.pdf ), message
ordering under multi-threaded environments can be particularly hard
(rsyslog itself doesn't seem to provide hard ordering guarantees).

To a point where the author clearly states: "so it is safe to assume that
in almost all practical cases, the sequence in which messages are stored or
emitted is not a proper indication of the order of events."
(I wander if anyone ever tried to use this statement in court in an attempt
to invalidate evidence :-) )

I still haven't tested but I would imagine that under multi threaded
configurations, batching followed by RouteText would result into flowfiles
reasonably out of order?

> Batching can definitely get much higher through put on ListenSyslog, but
if you have to parse
> them later in the flow with ParseSyslog then you still need to get each
message into its own
> FlowFile, which most likely entails SplitText with a line count of 1 and
then ParseSyslog.
> I don't know if this turns out much better then just letting ListenSyslog
parse them in the
> first place. If you are letting ListenSyslog do the parsing then you can
increase the concurrent
> tasks on the processor which means more threads parsing syslog messages
and outputing
> FlowFiles.

Correct, Another scenario is to process GetKafka, ListenHttp,
ListenLumberjack, etc flowfiles containing syslog formatted messages.
(Which I happen to be testing, hence the strange setup described
previously).


Cheers

Mime
View raw message