nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Curtis Ruck <curtis.r...@gmail.com>
Subject 1.8.0 Clustering Timeouts
Date Mon, 20 May 2019 14:06:55 GMT
I have a new 3 node cluster i'm trying to stand up.

On most of the API calls (to create proxy the proxy user) and others for
debugging, i get a "java.net.SocketTimeoutException: timeout" returned.  In
debugging the issue, the timeout occurs when the API replicates the request
to the same instance receiving the request.  I.e when node1 replicates the
request to node1.

I've attached a pruned debug logfile showing the /controller/config request
with the current request "nifi.cluster.node.read.timeout=10 sec".  From
what I can tell in the attached log is, Thread "Nifi Web Server-24" gets a
request for /nifi-api/controller/cluster" it replicates the request to
itself, which arrives 10 secs (the request timeout) later and comes in on
thread "NiFi Web Server-161" at roughly the same moment (5-6 ms after)
okhttp3 timeouts the request.  When 161 goes to respond the connection is
already closed.  So I think what's happening is that node1 locks/blocks
something that the replicated request needs to read from, but can't until
the lock is released when the request timeouts.  So why is the request
getting replicated to itself?

In troubleshooting I tried finding relevant REST endpoints and i get these
results:

/controller/config -> java.net.SocketTimeoutException: timeout
/flow/status ->  java.net.SocketTimeoutException: timeout

/flow/cluster/summary ->  {
  clusterSummary => {
    connectedNodes => "3 / 3",
    connectedNodeCount => 3,
    totalNodeCount => 3,
    connectedToCluster => true,
    clustered => true
  }
}

/controller/cluster -> {
  cluster => {
    nodes => [
      {
        nodeId => "8d066e95-b254-4527-a214-79ca5359f078",
        address => "node2.example.com",
        apiPort => 8443,
        status => "CONNECTED",
        heartbeat => "05/20/2019 02:19:13 UTC",
        roles => [
          "Cluster Coordinator"
        ],
        activeThreadCount => 0,
        queued => "0 / 0 bytes",
        events => [
          {
            timestamp => "05/20/2019 02:17:35 UTC",
            category => "INFO",
            message => "Received first heartbeat from connecting node. Node
connected."
          },
          {
            timestamp => "05/20/2019 02:17:20 UTC",
            category => "INFO",
            message => "Connection requested from existing node. Setting
status to connecting."
          },
          {
            timestamp => "05/20/2019 02:17:20 UTC",
            category => "INFO",
            message => "Connection requested from existing node. Setting
status to connecting."
          },
          {
            timestamp => "05/20/2019 02:17:14 UTC",
            category => "INFO",
            message => "Requesting that node connect to cluster"
          },
          {
            timestamp => "05/20/2019 02:17:14 UTC",
            category => "INFO",
            message => "Received heartbeat from node previously
disconnected due to Has Not Yet Connected
          }
        ],
        nodeStartTime => "05/20/2019 02:17:08 UTC"
      },
      {
        nodeId => "e5785deb-5646-421f-b858-e35342195988",
        address => "node3.example.com",
        apiPort => 8443,
        status => "CONNECTED",
        heartbeat => "05/20/2019 02:19:10 UTC",
        roles => [
          "Primary Node"
        ],
        activeThreadCount => 0,
        queued => "0 / 0 bytes",
        events => [
          {
            timestamp => "05/20/2019 02:17:30 UTC",
            category => "INFO",
            message => "Received first heartbeat from connecting node. Node
connected."
          },
          {
            timestamp => "05/20/2019 02:17:17 UTC",
            category => "INFO",
            message => "Connection requested from existing node. Setting
status to connecting."
          },
          {
            timestamp => "05/20/2019 02:17:17 UTC",
            category => "INFO",
            message => "Connection requested from existing node. Setting
status to connecting."
          },
          {
            timestamp => "05/20/2019 02:17:14 UTC",
            category => "INFO",
            message => "Received heartbeat from node previously
disconnected due to Has Not Yet Connected
          },
          {
            timestamp => "05/20/2019 02:17:14 UTC",
            category => "INFO",
            message => "Requesting that node connect to cluster"
          }
        ],
        nodeStartTime => "05/20/2019 02:17:06 UTC"
      },
      {
        nodeId => "334f49b5-d714-4013-8180-0ffca95aa829",
        address => " node1.example.com  ",
        apiPort => 8443,
        status => "CONNECTED",
        heartbeat => "05/20/2019 02:19:10 UTC",
        roles => [],
        activeThreadCount => 0,
        queued => "0 / 0 bytes",
        events => [
          {
            timestamp => "05/20/2019 02:17:35 UTC",
            category => "INFO",
            message => "Received first heartbeat from connecting node. Node
connected."
          },
          {
            timestamp => "05/20/2019 02:17:23 UTC",
            category => "INFO",
            message => "Connection requested from existing node. Setting
status to connecting."
          },
          {
            timestamp => "05/20/2019 02:17:17 UTC",
            category => "INFO",
            message => "Connection requested from existing node. Setting
status to connecting."
          },
          {
            timestamp => "05/20/2019 02:17:15 UTC",
            category => "INFO",
            message => "Received heartbeat from node previously
disconnected due to Has Not Yet Connected
          },
          {
            timestamp => "05/20/2019 02:17:15 UTC",
            category => "INFO",
            message => "Requesting that node connect to cluster"
          }
        ],
        nodeStartTime => "05/20/2019 02:17:06 UTC"
      }
    ],
    generated => "02:19:15 UTC"
  }
}


--
Curtis Ruck

Mime
View raw message