spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deegue (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-28239) Make TCP connections created by shuffle service auto close on YARN NodeManagers
Date Wed, 03 Jul 2019 08:31:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Deegue updated SPARK-28239:
---------------------------
    Description: 
When executing shuffle tasks, TCP connections(on port 7337 by default) will be established
by shuffle service.
It will like:

 !screenshot-1.png! 

However, some of the TCP connections are still busy when the task is actually finished. These
connections won't close automatically until we restart the NodeManager process.

Connections pile up and NodeManagers are getting slower and slower.

 !screenshot-2.png! 

These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE
to true according to [SPARK-23182|https://github.com/apache/spark/pull/20512].

So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which our cluster(running
10000+ jobs / day) is processing normally.

  was:
When executing shuffle tasks, TCP connections(on port 7337 by default) will be established
by shuffle service.
It will like:

 !screenshot-1.png! 

However, some of the TCP connections are still busy when the task is actually finished. These
connections won't close automatically until we restart the NodeManager process.

Connections pile up and NodeManagers are getting slower and slower.

 !screenshot-2.png! 

These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE
to true according to [SPARK-23182|https://github.com/apache/spark/pull/20512].

So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which our cluster(running
10000+ jobs / day) is processing 


> Make TCP connections created by shuffle service auto close on YARN NodeManagers
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-28239
>                 URL: https://issues.apache.org/jira/browse/SPARK-28239
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, YARN
>    Affects Versions: 2.4.0
>         Environment: Hadoop2.6.0-CDH5.8.3(netty3)
> Spark2.4.0(netty4)
> Configs:
> spark.shuffle.service.enabled=true
>            Reporter: Deegue
>            Priority: Minor
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> When executing shuffle tasks, TCP connections(on port 7337 by default) will be established
by shuffle service.
> It will like:
>  !screenshot-1.png! 
> However, some of the TCP connections are still busy when the task is actually finished.
These connections won't close automatically until we restart the NodeManager process.
> Connections pile up and NodeManagers are getting slower and slower.
>  !screenshot-2.png! 
> These unclosed TCP connections stay busy and it seem doesn't take effect when I set ChannelOption.SO_KEEPALIVE
to true according to [SPARK-23182|https://github.com/apache/spark/pull/20512].
> So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which our cluster(running
10000+ jobs / day) is processing normally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message