nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Clarke <matt.clarke....@gmail.com>
Subject Re: Clustered Site-toSite
Date Thu, 26 Nov 2015 05:35:09 GMT
The postHTTP processor has an option to send as a FlowFile to a listenHTTP
processor on another NiFi. This allows you to keep the FlowFile attributes
across multiple NiFis just like S2S.
On Nov 25, 2015 1:58 PM, "Matthew Gaulin" <mattgaulin@gmail.com> wrote:

> Ok, that all makes sense.  The main reason, we like doing it strictly as
> S2S is to maintain the flowfile attributes, so we would like to avoid
> HTTP.  Otherwise we would have to rebuild some of these attributes from the
> content, which isn't the end of the world, but still no fun.  We may
> consider the idea of the single receive node for distribution to a cluster,
> in order to further lock things down from a firewall standpoint.  I think
> the main thing we had to wrap our heads around was that every send node
> needs to be able to directly connect to every receiver node.  Thanks again
> for the very detailed responses!
>
> On Wed, Nov 25, 2015 at 10:44 AM Matthew Clarke <matt.clarke.138@gmail.com
> >
> wrote:
>
> > I am not following why you set all your Nodes (source and destination) to
> > use the same hostname(s).  Each hostname resolves to a single IP and by
> > doing so doesn't all data get sent to a single end-point?
> >
> > The idea behind spreading out the connections when using S2S is for smart
> > load balancing purposes.  If all data going to another cluster passed
> > through the NCM first, you lose that data load balancing capability
> because
> > one instance of NiFi (NCM in this case) has to receive all that network
> > traffic. It sound like the approach you want is to send source data to a
> > single NiFi point on another network and then have that single point
> > redistribute that data internally to that network across multiple
> > "processing" nodes in a cluster.
> >
> > This can be accomplished in several ways:
> >
> > 1. You could use S2S to send to a single instance of NiFi on the other
> > network and then have that instance S2S that data to a cluster on that
> same
> > network.
> > 2. You could use the postHTTP (source NiFi) and ListenHTTP (desitination
> > NiFi) processors to facilitate sending data to a single Node in the
> > destination cluster, and then have that Node use S2S to redistribute the
> > data across the entire cluster.
> >
> > A more ideal setup to limit connections needed between networks, might
> be:
> >
> > - Source cluster (consists of numerous low end servers or VMs) and a
> single
> > instance running on a beefy server/VM that will hand all data coming in
> and
> > out of this network.  Use S2S top communicate between internal cluster
> and
> > single instance on same network.
> > - The destination would be setup the same way cluster would look the
> same.
> > You can then use S2S or postHTTP to ListenHTTP to send data as NiFi
> > FlowFIles between your network. That network to network data transfer
> > shoudl occur between the two beefy single instances in each network.
> >
> > Matt
> >
> >
> >
> >
> > On Wed, Nov 25, 2015 at 9:10 AM, Matthew Gaulin <mattgaulin@gmail.com>
> > wrote:
> >
> > > Thank you for the info.  I was working with Edgardo on this.  We ended
> up
> > > having to set the SAME hostname on each of the source nodes, as the
> > > destination NCM uses for each of its nodes and of course open up the
> > > firewall rules so all source nodes can talk to each destination node.
> > This
> > > seems to jive with that you explained above.  It is a little annoying
> > that
> > > we have to have so much open to get this to work and can't have a
> single
> > > point of entry on the NCM to send all this data from one network to
> > > another.  Not a huge deal in the end though.  Thanks again.
> > >
> > > On Wed, Nov 25, 2015 at 8:36 AM Matthew Clarke <
> > matt.clarke.138@gmail.com>
> > > wrote:
> > >
> > > > let me explain first how S2S works when connecting from one cluster
> to
> > > > another cluster.
> > > >
> > > > I will start with the source cluster (this would be the cluster where
> > you
> > > > are adding the Remote Process Group (RPG) to the graph).  The NCM has
> > no
> > > > role in this cluster. Every Node in a cluster works independently
> form
> > > one
> > > > another, so by adding the RPG to the graph, you have added it to
> every
> > > > Node.  So Now the behavior of each Node is the same as as it would be
> > if
> > > it
> > > > were a standalone instance with regards to S2S.  The URL you are
> > > providing
> > > > in that RPG would be the URL for the NCM of the target cluster (This
> > URL
> > > is
> > > > not to the S2S port of the NCM, but to the same URL you would use to
> > > access
> > > > the UI of that cluster).  Now each Node in your "source" cluster is
> > > > communicating with the NCM of the destination cluster unaware at this
> > > time
> > > > that they are communicating with a NCM. These Nodes want to send
> their
> > > data
> > > > to the S2S port on that NCM. Now of course since the NCM does not
> > process
> > > > any data, it is not going to accept any data from those Nodes.  The
> > > > "destination" NCM will respond to each of the "source" Nodes with the
> > > > configured nifi.remote.input.socket.host=,
> > > nifi.remote.input.socket.port=,
> > > > and the status for each of those "destination" Nodes.  Using that
> > > provided
> > > > information, the source Nodes can logically distribute the data to
> our
> > > the
> > > > "destination' Nodes.
> > > >
> > > > When S2S fails beyond the initial URL connection, there are typically
> > on
> > > a
> > > > few likely causes:
> > > > 1. There is a firewall preventing communication between the source
> > Nodes
> > > > and the destination Nodes on the S2S ports.
> > > > 2. No value was supplied for nifi.remote.input.socket.host= on each
> of
> > > the
> > > > target Nodes.  When no value is provided whatever the "hostname"
> > command
> > > > returns is what is sent.  In many cases this hostname may end up
> being
> > > > "localhost" or some other value that is not resolvable/reachable by
> the
> > > > "source" systems.
> > > >
> > > > You can change the logging for S2S to DEBUG to see more detail about
> > the
> > > > message traffic between the "destination" NCM and the "source" nodes
> by
> > > > adding the following lines to the logback.xml files.
> > > >
> > > > <logger name="org.apache.nifi.remote" level="DEBUG"/>
> > > >
> > > > Watch the logs on one of the source Nodes specifically to see what
> > > hostname
> > > > and port is being returned for each destination Node.
> > > >
> > > > Thanks,
> > > > Matt
> > > >
> > > > On Wed, Nov 25, 2015 at 7:59 AM, Matthew Clarke <
> > > matt.clarke.138@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > > On Tue, Nov 24, 2015 at 1:38 PM, Edgardo Vega <
> > edgardo.vega@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Yeah the S2S port is set on all node.
> > > > >>
> > > > >> What should the host be set to on each machine? I first set it
to
> > the
> > > > NCM
> > > > >> ip on each machine in the cluster. Then I set the host to be
the
> ip
> > of
> > > > >> each
> > > > >> individual machine without luck.
> > > > >>
> > > > >> The S2S port is open to the internet for the entire cluster for
> > those
> > > > >> ports.
> > > > >>
> > > > >> On Tue, Nov 24, 2015 at 1:35 PM, Matthew Clarke <
> > > > >> matt.clarke.138@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Did you configure the S2S port on all the Nodes in the cluster
> you
> > > are
> > > > >> > trying to S2S to?
> > > > >> >
> > > > >> > In addition to setting the port on those Nodes, you should
also
> > set
> > > > the
> > > > >> S2S
> > > > >> > hostname.  The hostname entered should be resolvable and
> reachable
> > > by
> > > > >> the
> > > > >> > systems trying to S2S to that cluster.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Matt
> > > > >> >
> > > > >> > On Tue, Nov 24, 2015 at 1:29 PM, Edgardo Vega <
> > > edgardo.vega@gmail.com
> > > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Trying to get site to site working from one cluster
to
> another.
> > It
> > > > >> works
> > > > >> > is
> > > > >> > > the connection goes from cluster to single node but
not
> clusted
> > to
> > > > >> > > clustered.
> > > > >> > >
> > > > >> > > I was looking at jira and saw this ticket
> > > > >> > > https://issues.apache.org/jira/browse/NIFI-872.
> > > > >> > >
> > > > >> > > Is this saying I am out of luck or is there some special
> config
> > > > that I
> > > > >> > must
> > > > >> > > do to make this work?
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Cheers,
> > > > >> > >
> > > > >> > > Edgardo
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Cheers,
> > > > >>
> > > > >> Edgardo
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message