nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Grande <agra...@hortonworks.com>
Subject Re: Logstash/ Filebeat/ Lumberjack -> Nifi
Date Mon, 09 May 2016 12:17:54 GMT
Conrad,

Set up a site-to-site connection between nifi edge nodes and your main processing cluster
running a bigger nifi instance. This is the 'application' level protocol native to NiFi. MiNiFi,
in turn, uses it under the hood as well, which will ease migration for you in the _near_ future
;)

Andrew



On Sun, May 8, 2016 at 11:52 PM -0700, "Conrad Crampton" <conrad.crampton@SecData.com<mailto:conrad.crampton@SecData.com>>
wrote:

Thanks for this – you make some very interesting points about the use of Logstash and you
are correct, I am only just looking at Logstash but will now look to use Nifi if possible
instead to connect to my central cluster.
Regards
Conrad

From: Andrew Psaltis <psaltis.andrew@gmail.com<mailto:psaltis.andrew@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Saturday, 7 May 2016 at 16:43
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi

Hi Conrad,
Based on your email it sounds like you are potentially just getting started with Logstash.
The one thing I can share is that up until recently I worked in an environment where we had
~3,000 nodes deployed and all either had Logstash or Flume (was transitioning to Logstash).
We used Puppet and the Logstash module was in the base templates so as App developers provisioned
new nodes Logstash was automatically deployed and configured. I can tell you that it seems
really easy at first, however, my team was always messing with, tweaking, and troubleshooting
the Logstash scripts as we wanted to ingest different data sources, modify how the data was
captured, or fix bugs. Knowing now what I do about NiFi, if I had a chance to do it over again
(will be talking to old colleagues about it) I would just use Nifi on all of those edge nodes
and then send the data to central NiFi cluster. To me there are at least several huge benefits
to this:

  1.  You use one tool, which provides an amazingly easy and very powerful way to control
and adjust the dataflow all without having to muck with any scripts. You can easily filter
/ enrich / transform the data at the edge node all via a UI.
  2.  You get provenance information from the edge all the way back. This is very powerful,
you can actually answer the questions from others of "how come my log entry never made it
to System X" or even better how the data was changed along the way. The "why did my log entry
make it to System X" sometimes can be answered via searching through logs, but that also assumes
you have the information in the logs to begin with. I can tell you that these questions will
come up. We had data that would go through a pipeline and finally into HDFS. And we would
get the questions from app developers when they queried the data in Hive and wanted to know
why certain log entries were missing.

Hope this helps.

In good health,
Andrew

On Sat, May 7, 2016 at 8:15 AM, Conrad Crampton <conrad.crampton@secdata.com<mailto:conrad.crampton@secdata.com>>
wrote:
Hi Bryan,
Some good tips and validation of my thinking.
It did occur to me to use the standalone NiFi and as I have no particular need to use Logstash
for any other reason.
Thanks
Conrad

From: Bryan Bende <bbende@gmail.com<mailto:bbende@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Friday, 6 May 2016 at 14:56
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi

Hi Conrad,

I am not that familiar with LogStash, but as you mentioned there is a PR for Lumberjack processors
[1] which is not yet released, but could help if you are already using LogStash.
If LogStash has outputs for TCP, UDP, or syslog then like you mentioned, it seems like this
could work well with ListenTCP, ListenUDP, or ListenSyslog.

I think the only additional benefit of Lumberjack is that it is an application level protocol
that provides additional reliability on top of the networking protocols, meaning if ListenLumberjack
receives an event over TCP it would then acknowledge that NiFi has successfully received and
stored the data, since TCP can only guarantee it was delivered to the socket, but the application
could have dropped it.

Although MiNiFi is not yet released, a possible solution is to run standalone NiFi instances
on the servers where your logs are, with a simple flow like TailFile -> Remote Process
Group which sends the logs back to a central NiFi instance over Site-To-Site.

Are you able to share any more info about what kind of logs they are and how they are being
produced?
If they are coming from Java applications using logback or log4j, and if you have control
over those applications, you can also use a specific appender like a UDP appender to send
directly over to ListenUDP in NiFi.

Hope that helps.

-Bryan

[1] https://github.com/apache/nifi/pull/290

On Fri, May 6, 2016 at 3:33 AM, Conrad Crampton <conrad.crampton@secdata.com<mailto:conrad.crampton@secdata.com>>
wrote:
Hi,
Some advice if possible please. Whilst I would love to wait for the MiNiFi project realise
its objectives as this sounds exactly what I want from the initial suggestions I have a pressing
need to shift some log files on remote servers (to my DC) to my NiFi cluster. Having a quick
look at LogStash it would look to provide what I want but there doesn’t (yet – I’m aware
of the work going on Lumberjack processor but not in current release) appear to be a simple
way of getting files from Logstash to Nifi.

The options currently would appear to be use any number of output plugins in Logstash –
TCP, UDP, syslog, kafka, http, rabbitmq then use the equivalent receiver in Nifi (with some
intermediate service in some cases – Kafka, rabbitmq).

Can any one suggest the ‘best’ way here? I’m trying to prove a point about cutting out
some other intermediate product so this is something that has to be in production now –
I can always refactor at a later date to have a ‘better’ solution (MiNiFi ??).

Why don’t I ask on Logstash forums? You folks have always been a great help before ;-)

Thanks
Conrad

Nb. Of course not saying Logstash folks wouldn’t be equally helpful :-)


SecureData, combating cyber threats

________________________________

The information contained in this message or any of its attachments may be privileged and
confidential and intended for the exclusive use of the intended recipient. If you are not
the intended recipient any disclosure, reproduction, distribution or other dissemination or
use of this communications is strictly prohibited. The views expressed in this email are those
of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only
valid if followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address:
SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT




***This email originated outside SecureData***

Click here<https://www.mailcontrol.com/sr/CdEQiWndhxLGX2PQPOmvUsrLibhXE7+SpVooqDfjfmrv9UcAoCvw58JRjsQQpswieUDNxz32L0IKghm6!2a+jw==>
to report this email as spam.



--
Thanks,
Andrew

Subscribe to my book: Streaming Data<http://manning.com/psaltis>
[https://static.licdn.com/scds/common/u/img/webpromo/btn_viewmy_160x25.png]<https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
twiiter: @itmdata<http://twitter.com/intent/user?screen_name=itmdata>

Mime
View raw message