nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: PutTCP connector not cleaning up dangling connections
Date Mon, 18 Sep 2017 13:16:43 GMT
The code in SocketChannelSender that Davy pointed out could definitely
be the problem...

It makes a non-blocking channel and calls connect, then goes into a
loop waiting for finishConnect() to return true, but if that doesn't
happen before the configured timeout, then it throws an exception, but
it doesn't first close the channel, and the processor also doesn't
close the sender.

As an example comparison, this code which is being used by PutTCP:

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-processor-utils/src/main/java/org/apache/nifi/processor/util/put/sender/SocketChannelSender.java#L45-L78

Should be doing something like this where it catches, closes, and rethrows:

https://github.com/apache/nifi/blob/master/nifi-commons/nifi-security-utils/src/main/java/org/apache/nifi/remote/io/socket/ssl/SSLSocketChannel.java#L140-L179


On Mon, Sep 18, 2017 at 9:09 AM, Joe Witt <joe.witt@gmail.com> wrote:
> Davy
>
> Interesting.  So in looking through the stack trace I don't see
> anything related to sockets nifi has initiated to another service and
> nothing for PutTCP.  I'm not saying that means there is nothing but
> the stack traces only show the custom GetTCP processors.
>
> You can use netstat to show open sockets from the nifi process.  Can
> you try that and share those?
>
> Does the NiFi UI show the processor as having stuck threads?  I'm
> guessing not since there is nothing in the stack traces.
>
> Thanks
>
> On Mon, Sep 18, 2017 at 1:54 AM, ddewaele <ddewaele@gmail.com> wrote:
>> Stopping the processor doesn't cleanup the tcp connection. It remains
>> ESTABLISHED.
>>
>> There are 2 ways of getting out of it (none of them are ideal).
>>
>> - Restarting Nifi
>> - Restarting the Moxa serial ports
>>
>> I've dumped the output in the following gist :
>> https://gist.github.com/ddewaele/83705003740674962c1e133fb617f68c
>>
>> The GetTCP processor you'll see in the thread dump also interacts with the
>> moxa. It is a Netty based custom processor we created (because there was no
>> GetTCP at the time). However, we log all interactions (including client
>> ports) with this processor and all of them end up getting closed correctly.
>>
>> So the "hanging" connection originated from the built-in PutTCP processor.
>>
>>
>> Joe Witt wrote
>>> If you stop the processor manually does it clean them up?
>>>
>>> When the connections appear stuck can you please get a thread dump?
>>>
>>> bin/nifi.sh dump
>>>
>>> The results end up in bootstrap.log.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Sep 17, 2017 2:22 PM, "ddewaele" &lt;
>>
>>> ddewaele@
>>
>>> &gt; wrote:
>>>
>>>> We are using NiFi PutTCP processors to send messages to a number of Moxa
>>>> onCell ip gateway devices.
>>>>
>>>> These Moxa devices are running on a cellular network with not always the
>>>> most ideal connection. The Moxa only allows for a maximum of 2
>>>> simultaneous
>>>> client connections.
>>>>
>>>> What we notice is that although we specify connection / read timeouts on
>>>> both PutTCP and the Moxa, that sometimes a connection get "stuck". (In
>>>> the
>>>> moxa network monitoring we see 2 client sockets coming from PutTCP in the
>>>> ESTABLISHED state that never go away).
>>>>
>>>> This doesn't always happen, but often enough for it to be considered a
>>>> problem, as it requires a restart of the moxa ports to clear the
>>>> connections
>>>> (manual step). It typically happens when PutTCP experiences a Timeout.
>>>>
>>>> On the PutTCP processors we have the following settings :
>>>>
>>>> - Idle Connection Expiration : 30 seconds  (we've set this higher due to
>>>> bad
>>>> gprs connection)
>>>> - Timeout : 10 seconds (this is only used as a timeout for establishing
>>>> the
>>>> connection)
>>>>
>>>> On the Moxas we have
>>>>
>>>> - TCP alive check time : 2min (this should force the Moxa to close the
>>>> socket)
>>>>
>>>> Yet for some reason the connection remains established.
>>>>
>>>> Here's what I found out :
>>>>
>>>> On the moxa I noticed a connection (with client port 48440) that is in
>>>> ESTABLISHED mode for 4+ hours. (blocking other connections). On the Moxa
>>>> I
>>>> can see when the connection was established :
>>>>
>>>> 2017/09/17 14:20:29 [OpMode] Port01 Connect 10.192.2.90:48440
>>>>
>>>> I can track that down in Nifi via the logs (unfortunately PutTCP doesn't
>>>> log
>>>> client ports, but from the timestamp  I'm sure it's this connection :
>>>>
>>>> 2017-09-17 14:20:10,837 DEBUG [Timer-Driven Process Thread-10]
>>>> o.apache.nifi.processors.standard.PutTCP
>>>> PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections,
>>>> creating a new one...
>>>> 2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10]
>>>> o.apache.nifi.processors.standard.PutTCP
>>>> PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections,
>>>> and unable to create a new one, transferring
>>>> StandardFlowFileRecord[uuid=79f2a166-5211-4d2d-9275-03f0ce4d5b29,claim=
>>>> StandardContentClaim
>>>> [resourceClaim=StandardResourceClaim[id=1505641210025-1,
>>>> container=default,
>>>> section=1], offset=84519, length=9],offset=0,name=
>>>> 23934743676390659,size=9]
>>>> to failure: java.net.SocketTimeoutException: Timed out connecting to
>>>> 10.32.133.40:4001
>>>> 2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10]
>>>> o.apache.nifi.processors.standard.PutTCP
>>>> java.net.SocketTimeoutException: Timed out connecting to
>>>> 10.32.133.40:4001
>>>>         at
>>>> org.apache.nifi.processor.util.put.sender.SocketChannelSender.open(
>>>> SocketChannelSender.java:66)
>>>> ~[nifi-processor-utils-1.1.0.jar:1.1.0]
>>>>         at
>>>> org.apache.nifi.processor.util.put.AbstractPutEventProcessor.createSender(
>>>> AbstractPutEventProcessor.java:312)
>>>> ~[nifi-processor-utils-1.1.0.jar:1.1.0]
>>>>         at
>>>> org.apache.nifi.processors.standard.PutTCP.createSender(PutTCP.java:121)
>>>> [nifi-standard-processors-1.1.0.jar:1.1.0]
>>>>         at
>>>> org.apache.nifi.processor.util.put.AbstractPutEventProcessor.
>>>> acquireSender(AbstractPutEventProcessor.java:334)
>>>> ~[nifi-processor-utils-1.1.0.jar:1.1.0]
>>>>         at
>>>> org.apache.nifi.processors.standard.PutTCP.onTrigger(PutTCP.java:176)
>>>> [nifi-standard-processors-1.1.0.jar:1.1.0]
>>>>         at
>>>> org.apache.nifi.controller.StandardProcessorNode.onTrigger(
>>>> StandardProcessorNode.java:1099)
>>>> [nifi-framework-core-1.1.0.jar:1.1.0]
>>>>         at
>>>> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(
>>>> ContinuallyRunProcessorTask.java:136)
>>>> [nifi-framework-core-1.1.0.jar:1.1.0]
>>>>         at
>>>> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(
>>>> ContinuallyRunProcessorTask.java:47)
>>>> [nifi-framework-core-1.1.0.jar:1.1.0]
>>>>         at
>>>> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(
>>>> TimerDrivenSchedulingAgent.java:132)
>>>> [nifi-framework-core-1.1.0.jar:1.1.0]
>>>>         at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> [na:1.8.0_111]
>>>>         at java.util.concurrent.FutureTask.runAndReset(
>>>> FutureTask.java:308)
>>>> [na:1.8.0_111]
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$
>>>> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>> [na:1.8.0_111]
>>>>         at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
>>>> ScheduledThreadPoolExecutor.java:294)
>>>> [na:1.8.0_111]
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.java:1142)
>>>> [na:1.8.0_111]
>>>>         at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.java:617)
>>>> [na:1.8.0_111]
>>>>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
>>>>
>>>>
>>>> On an OS level I indeed see the TCP connection originated from Nifi :
>>>>
>>>> netstat -tn | grep 48440
>>>>
>>>> tcp      711      0 10.192.2.90:48440       10.32.133.40:4001
>>>> ESTABLISHED
>>>>
>>>>
>>>> lsof -i TCP:48440
>>>>
>>>> COMMAND  PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
>>>> java    3424 root 1864u  IPv4 404675057      0t0  TCP
>>>> NifiServer:48440->10.32.133.40:newoak (ESTABLISHED)
>>>>
>>>> ps -ef | grep 3424
>>>>
>>>> root      3424  3390  8 11:32 ?        00:44:33 java -classpath
>>>> /opt/nifi-1.1.0/./conf.....
>>>>
>>>> Any ideas on how to debug this further ?
>>>>
>>>> I looked at the code in SocketChannelSender.java:66 and was wondering it
>>>> no
>>>> additional cleanup is necessary when throwing the SocketTimeoutException.
>>>> If
>>>> the connection ends up getting established after the timeout, doesn't
>>>> PutTCP
>>>> need to clean that up ?
>>>>
>>>> Thx
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
>>>>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/

Mime
View raw message