nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Psaltis <>
Subject Re: Logstash/ Filebeat/ Lumberjack -> Nifi
Date Sat, 07 May 2016 15:43:09 GMT
Hi Conrad,
Based on your email it sounds like you are potentially just getting started
with Logstash. The one thing I can share is that up until recently I worked
in an environment where we had ~3,000 nodes deployed and all either had
Logstash or Flume (was transitioning to Logstash). We used Puppet and the
Logstash module was in the base templates so as App developers provisioned
new nodes Logstash was automatically deployed and configured. I can tell
you that it seems really easy at first, however, my team was always messing
with, tweaking, and troubleshooting the Logstash scripts as we wanted to
ingest different data sources, modify how the data was captured, or fix
bugs. Knowing now what I do about NiFi, if I had a chance to do it over
again (will be talking to old colleagues about it) I would just use Nifi on
all of those edge nodes and then send the data to central NiFi cluster. To
me there are at least several huge benefits to this:

   1. You use one tool, which provides an amazingly easy and very powerful
   way to control and adjust the dataflow all without having to muck with any
   scripts. You can easily filter / enrich / transform the data at the edge
   node all via a UI.
   2. You get provenance information from the edge all the way back. This
   is very powerful, you can actually answer the questions from others of "how
   come my log entry never made it to System X" or even better how the data
   was changed along the way. The "why did my log entry make it to System X"
   sometimes can be answered via searching through logs, but that also assumes
   you have the information in the logs to begin with. I can tell you that
   these questions will come up. We had data that would go through a pipeline
   and finally into HDFS. And we would get the questions from app developers
   when they queried the data in Hive and wanted to know why certain log
   entries were missing.

Hope this helps.

In good health,

On Sat, May 7, 2016 at 8:15 AM, Conrad Crampton <
> wrote:

> Hi Bryan,
> Some good tips and validation of my thinking.
> It did occur to me to use the standalone NiFi and as I have no particular
> need to use Logstash for any other reason.
> Thanks
> Conrad
> From: Bryan Bende <>
> Reply-To: "" <>
> Date: Friday, 6 May 2016 at 14:56
> To: "" <>
> Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi
> Hi Conrad,
> I am not that familiar with LogStash, but as you mentioned there is a PR
> for Lumberjack processors [1] which is not yet released, but could help if
> you are already using LogStash.
> If LogStash has outputs for TCP, UDP, or syslog then like you mentioned,
> it seems like this could work well with ListenTCP, ListenUDP, or
> ListenSyslog.
> I think the only additional benefit of Lumberjack is that it is an
> application level protocol that provides additional reliability on top of
> the networking protocols, meaning if ListenLumberjack receives an event
> over TCP it would then acknowledge that NiFi has successfully received and
> stored the data, since TCP can only guarantee it was delivered to the
> socket, but the application could have dropped it.
> Although MiNiFi is not yet released, a possible solution is to run
> standalone NiFi instances on the servers where your logs are, with a simple
> flow like TailFile -> Remote Process Group which sends the logs back to a
> central NiFi instance over Site-To-Site.
> Are you able to share any more info about what kind of logs they are and
> how they are being produced?
> If they are coming from Java applications using logback or log4j, and if
> you have control over those applications, you can also use a specific
> appender like a UDP appender to send directly over to ListenUDP in NiFi.
> Hope that helps.
> -Bryan
> [1]
> On Fri, May 6, 2016 at 3:33 AM, Conrad Crampton <
>> wrote:
>> Hi,
>> Some advice if possible please. Whilst I would love to wait for the
>> MiNiFi project realise its objectives as this sounds exactly what I want
>> from the initial suggestions I have a pressing need to shift some log files
>> on remote servers (to my DC) to my NiFi cluster. Having a quick look at
>> LogStash it would look to provide what I want but there doesn’t (yet – I’m
>> aware of the work going on Lumberjack processor but not in current release)
>> appear to be a simple way of getting files from Logstash to Nifi.
>> The options currently would appear to be use any number of output plugins
>> in Logstash – TCP, UDP, syslog, kafka, http, rabbitmq then use the
>> equivalent receiver in Nifi (with some intermediate service in some cases –
>> Kafka, rabbitmq).
>> Can any one suggest the ‘best’ way here? I’m trying to prove a point
>> about cutting out some other intermediate product so this is something that
>> has to be in production now – I can always refactor at a later date to have
>> a ‘better’ solution (MiNiFi ??).
>> Why don’t I ask on Logstash forums? You folks have always been a great
>> help before ;-)
>> Thanks
>> Conrad
>> Nb. Of course not saying Logstash folks wouldn’t be equally helpful :-)
>> SecureData, combating cyber threats
>> ------------------------------
>> The information contained in this message or any of its attachments may
>> be privileged and confidential and intended for the exclusive use of the
>> intended recipient. If you are not the intended recipient any disclosure,
>> reproduction, distribution or other dissemination or use of this
>> communications is strictly prohibited. The views expressed in this email
>> are those of the individual and not necessarily of SecureData Europe Ltd.
>> Any prices quoted are only valid if followed up by a formal written quote.
>> SecureData Europe Limited. Registered in England & Wales 04365896.
>> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
>> Maidstone, Kent, ME16 9NT
> ***This email originated outside SecureData***
> Click here
> <!2a+jw==>
> to report this email as spam.


Subscribe to my book: Streaming Data <>
twiiter: @itmdata <>

View raw message