nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre <>
Subject Re: Spark or custom processor?
Date Thu, 02 Jun 2016 23:26:45 GMT

Your work stream is very similar to mine. NIFI will works ok by itself,
without the need for Spark (keep Spark option there but for other types of

What we do is:

Syslog -> local disk -> logstash-forwarder (tail) -> ListenLumberjack
(PR290 -  experimental and not yet merged) -> ParseSyslog -> BlackMagicStuff

The reason we do it this way is to decouple the data flow from blocking
mechanisms such as RELP and Lumberjack; no matter what happens with NiFi
cluster, you still have a copy of the data for replay.

This is particularly relevant on environments where you would use TCP
syslog or any other protocol that can block if unable to push log messages
(search for tcp syslog causing an outage to Atlassian cloud a few years

We are not an Internet scale shop but still have enough logs to make a SIEM
suffer and in our opinion NiFi is able to perform well.

For load balancing, any session based TCP base lb will help you utilise all
your nifi nodes.

On 2 Jun 2016 23:28, "Conrad Crampton" <> wrote:


ListenSyslog (using the approach that is being discussed currently in
another thread – ListenSyslog running on primary node as RGP, all other
nodes connecting to the port that the RPG exposes).

Various enrichment, routing on attributes etc. and finally into HDFS as

I want to branch off at an appropriate point in the flow and do some
further realtime analysis – got the output to port feeding to Spark process
working fine (notwithstanding the issue that you have been so kind to help
with previously with the SSLContext), just thinking about if this is most
appropriate solution.

I have dabbled with a custom processor (for enriching url splitting/
enriching etc. – probably could have done with ExecuteScript processor in
hindsight) so am comfortable with going this route if that is deemed more



*From: *Bryan Bende <>
*Reply-To: *"" <>
*Date: *Thursday, 2 June 2016 at 13:12
*To: *"" <>
*Subject: *Re: Spark or custom processor?


I would think that you could do this all in NiFi.

How do the log files come into NiFi? TailFile, ListenUDP/ListenTCP,


On Thu, Jun 2, 2016 at 6:41 AM, Conrad Crampton <>


Any advice on ‘best’ architectural approach whereby some processing
function has to be applied to every flow file in a dataflow with some
(possible) output based on flowfile content.

e.g. inspect log files for specific ip then send message to syslog

approach 1


Output port from NiFi -> Spark listens to that stream -> processes and
outputs accordingly

Advantages – scale spark job on Yarn, decoupled (reusable) from NiFi

Disadvantages – adds complexity, decoupled from NiFi.

Approach 2


Custom processor -> PutSyslog

Advantages – reuse existing NiFi processors/ capability, obvious flow
(design intent)

Disadvantages – scale??

Any comments/ advice/ experience of either approaches?



SecureData, combating cyber threats


The information contained in this message or any of its attachments may be
privileged and confidential and intended for the exclusive use of the
intended recipient. If you are not the intended recipient any disclosure,
reproduction, distribution or other dissemination or use of this
communications is strictly prohibited. The views expressed in this email
are those of the individual and not necessarily of SecureData Europe Ltd.
Any prices quoted are only valid if followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896.
Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
Maidstone, Kent, ME16 9NT

***This email originated outside SecureData***

Click here
to report this email as spam.

View raw message