nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Clarke <matt.clarke....@gmail.com>
Subject Re: Clustered Site-toSite
Date Wed, 25 Nov 2015 15:44:22 GMT
I am not following why you set all your Nodes (source and destination) to
use the same hostname(s).  Each hostname resolves to a single IP and by
doing so doesn't all data get sent to a single end-point?

The idea behind spreading out the connections when using S2S is for smart
load balancing purposes.  If all data going to another cluster passed
through the NCM first, you lose that data load balancing capability because
one instance of NiFi (NCM in this case) has to receive all that network
traffic. It sound like the approach you want is to send source data to a
single NiFi point on another network and then have that single point
redistribute that data internally to that network across multiple
"processing" nodes in a cluster.

This can be accomplished in several ways:

1. You could use S2S to send to a single instance of NiFi on the other
network and then have that instance S2S that data to a cluster on that same
network.
2. You could use the postHTTP (source NiFi) and ListenHTTP (desitination
NiFi) processors to facilitate sending data to a single Node in the
destination cluster, and then have that Node use S2S to redistribute the
data across the entire cluster.

A more ideal setup to limit connections needed between networks, might be:

- Source cluster (consists of numerous low end servers or VMs) and a single
instance running on a beefy server/VM that will hand all data coming in and
out of this network.  Use S2S top communicate between internal cluster and
single instance on same network.
- The destination would be setup the same way cluster would look the same.
You can then use S2S or postHTTP to ListenHTTP to send data as NiFi
FlowFIles between your network. That network to network data transfer
shoudl occur between the two beefy single instances in each network.

Matt




On Wed, Nov 25, 2015 at 9:10 AM, Matthew Gaulin <mattgaulin@gmail.com>
wrote:

> Thank you for the info.  I was working with Edgardo on this.  We ended up
> having to set the SAME hostname on each of the source nodes, as the
> destination NCM uses for each of its nodes and of course open up the
> firewall rules so all source nodes can talk to each destination node.  This
> seems to jive with that you explained above.  It is a little annoying that
> we have to have so much open to get this to work and can't have a single
> point of entry on the NCM to send all this data from one network to
> another.  Not a huge deal in the end though.  Thanks again.
>
> On Wed, Nov 25, 2015 at 8:36 AM Matthew Clarke <matt.clarke.138@gmail.com>
> wrote:
>
> > let me explain first how S2S works when connecting from one cluster to
> > another cluster.
> >
> > I will start with the source cluster (this would be the cluster where you
> > are adding the Remote Process Group (RPG) to the graph).  The NCM has no
> > role in this cluster. Every Node in a cluster works independently form
> one
> > another, so by adding the RPG to the graph, you have added it to every
> > Node.  So Now the behavior of each Node is the same as as it would be if
> it
> > were a standalone instance with regards to S2S.  The URL you are
> providing
> > in that RPG would be the URL for the NCM of the target cluster (This URL
> is
> > not to the S2S port of the NCM, but to the same URL you would use to
> access
> > the UI of that cluster).  Now each Node in your "source" cluster is
> > communicating with the NCM of the destination cluster unaware at this
> time
> > that they are communicating with a NCM. These Nodes want to send their
> data
> > to the S2S port on that NCM. Now of course since the NCM does not process
> > any data, it is not going to accept any data from those Nodes.  The
> > "destination" NCM will respond to each of the "source" Nodes with the
> > configured nifi.remote.input.socket.host=,
> nifi.remote.input.socket.port=,
> > and the status for each of those "destination" Nodes.  Using that
> provided
> > information, the source Nodes can logically distribute the data to our
> the
> > "destination' Nodes.
> >
> > When S2S fails beyond the initial URL connection, there are typically on
> a
> > few likely causes:
> > 1. There is a firewall preventing communication between the source Nodes
> > and the destination Nodes on the S2S ports.
> > 2. No value was supplied for nifi.remote.input.socket.host= on each of
> the
> > target Nodes.  When no value is provided whatever the "hostname" command
> > returns is what is sent.  In many cases this hostname may end up being
> > "localhost" or some other value that is not resolvable/reachable by the
> > "source" systems.
> >
> > You can change the logging for S2S to DEBUG to see more detail about the
> > message traffic between the "destination" NCM and the "source" nodes by
> > adding the following lines to the logback.xml files.
> >
> > <logger name="org.apache.nifi.remote" level="DEBUG"/>
> >
> > Watch the logs on one of the source Nodes specifically to see what
> hostname
> > and port is being returned for each destination Node.
> >
> > Thanks,
> > Matt
> >
> > On Wed, Nov 25, 2015 at 7:59 AM, Matthew Clarke <
> matt.clarke.138@gmail.com
> > >
> > wrote:
> >
> > >
> > >
> > > On Tue, Nov 24, 2015 at 1:38 PM, Edgardo Vega <edgardo.vega@gmail.com>
> > > wrote:
> > >
> > >> Yeah the S2S port is set on all node.
> > >>
> > >> What should the host be set to on each machine? I first set it to the
> > NCM
> > >> ip on each machine in the cluster. Then I set the host to be the ip of
> > >> each
> > >> individual machine without luck.
> > >>
> > >> The S2S port is open to the internet for the entire cluster for those
> > >> ports.
> > >>
> > >> On Tue, Nov 24, 2015 at 1:35 PM, Matthew Clarke <
> > >> matt.clarke.138@gmail.com>
> > >> wrote:
> > >>
> > >> > Did you configure the S2S port on all the Nodes in the cluster you
> are
> > >> > trying to S2S to?
> > >> >
> > >> > In addition to setting the port on those Nodes, you should also set
> > the
> > >> S2S
> > >> > hostname.  The hostname entered should be resolvable and reachable
> by
> > >> the
> > >> > systems trying to S2S to that cluster.
> > >> >
> > >> > Thanks,
> > >> > Matt
> > >> >
> > >> > On Tue, Nov 24, 2015 at 1:29 PM, Edgardo Vega <
> edgardo.vega@gmail.com
> > >
> > >> > wrote:
> > >> >
> > >> > > Trying to get site to site working from one cluster to another.
It
> > >> works
> > >> > is
> > >> > > the connection goes from cluster to single node but not clusted
to
> > >> > > clustered.
> > >> > >
> > >> > > I was looking at jira and saw this ticket
> > >> > > https://issues.apache.org/jira/browse/NIFI-872.
> > >> > >
> > >> > > Is this saying I am out of luck or is there some special config
> > that I
> > >> > must
> > >> > > do to make this work?
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Cheers,
> > >> > >
> > >> > > Edgardo
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Cheers,
> > >>
> > >> Edgardo
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message