spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh Sharma <harshsharma21...@gmail.com>
Subject Re: Connection reset by peer : failed to remove cache rdd
Date Thu, 02 Sep 2021 05:51:46 GMT


On 2021/08/30 13:32:19, Jacek Laskowski <jacek@japila.pl> wrote: 
> Hi,
> 
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
> 
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
> 
> <https://twitter.com/jaceklaskowski>
> 
> 
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <harshsharma21189@gmail.com>
> wrote:
> 
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings  : failed to remove
> > cache rdd or failed  to remove broadcast variable.
> >
> > Please help us how to mitigate this  :
> >
> > Executor memory : 12g
> >
> > Network timeout :   600000
> >
> > Heartbeat interval : 250000
> >
> >
> >
> > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (1 + 3)
> > / 200]
> > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (2 + 3)
> > / 200]
> > [Stage 292:>                                                      (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > <hostname>
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from <hostname> is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,012 WARN
> > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message