nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: Unable to modify flow when one of the nodes in a cluster is disconnected
Date Thu, 27 Jun 2019 14:04:05 GMT
Purushotham,

If the node is disconnected and then attempts to reconnect, flow election does not occur.
Rather, the node obtains a copy of the flow
from the cluster, determines whether or not it matches, and if so rejoins. If the flow does
not match, it disconnects and stops trying to
reconnect.

There are a few reasons that the node doesn't just inherit the cluster's flow blindly. Firstly,
if a user were to delete a connection, and the
re-joining node had data in that connection, it would lose the data. This is probably the
most important reason - we never want to
design for data loss.

Secondly, when a node is disconnected from the cluster, the user is able to make changes.
There are times when users will disconnect a
particular node from the cluster and make some changes to the dataflow for diagnostic purposes.
For example, they may want to temporarily
send data to a new endpoint for sampling. When this happens, we don't want to just blindly
lose those changes, because the user may not
have wanted those changes lost. And if an admin is managing several systems, it's possible
that they could accidentally configure the node
to point to the wrong cluster, in which case it could potentially lose the entire dataflow.
Perhaps not a problem if the dataflow exists on other
nodes, but if this is a standalone node being converted into cluster, it could be devastating
for the user.

Now, there are some changes that we do allow, and the node will still re-join. For instance,
if the positions of elements change, elements are started
or stopped, etc. In these cases, the new node will just inherit the flow from the cluster
and take on those changes.

I think it would probably be advantageous to allow the node to back up its own flow before
inheriting from the cluster, and then apply any changes from
the cluster that do not result in data loss (i.e., if any connection is removed and the node
has data in that connection, then fail, else inherit). The big down
side there, honestly, is that it's just a huge amount of effort that would be required in
order to make that work properly.

So to make a long story short: there are reasons that we don't just inherit the flow, but
we could work around those problems. There are definitely
areas where we could improve, but it's just not been taken on yet by anyone in the community.

Thanks
-Mark


On Jun 27, 2019, at 3:37 AM, Purushotham Pushpavanthar <pushpavanthar@gmail.com<mailto:pushpavanthar@gmail.com>>
wrote:

Hi,

I'm having a 3 nodes( ver 1.9.2) cluster running in production. As infra is unreliable due
to various factors, our nodes go down often. We don't have distinction between dev and prod
cluster. We modify, deploy, test in the same cluster. However, when one of the node goes down
NiFi restricts us to modify the state of the flow by throwing warning window in the attachment.

I read<https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#flow-election>
that if a node in the cluster is disconnected and comes back again, flow election happens.
I would like to understand the motivation for not allowing the change of flow in the above
scenario.
I was thinking why can't the latest node joining to the cluster pull a most elected flow.xml.gz
from the cluster and apply it to itself?

Regards,
Purushotham Pushpavanth



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message