spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh Sharma <harshsharma21...@gmail.com>
Subject Re: Connection reset by peer : failed to remove cache rdd
Date Thu, 02 Sep 2021 07:08:04 GMT


On 2021/09/02 06:00:26, Harsh Sharma <harshsharma21189@gmail.com> wrote: 
> Please Find reply : 
> Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming? 
> 
> ans :its Spark SQL
> 
> Do you use broadcast variables ?
> 
> ans : yes we are using broadcast variables
>  or are the errors
>  coming from broadcast joins perhaps? 
not sure about this

> 
> On 2021/08/30 13:32:19, Jacek Laskowski <jacek@japila.pl> wrote: 
> > Hi,
> > 
> > No idea what might be going on here, but I'd not worry much about it and
> > simply monitor disk usage as some broadcast blocks might have left over.
> > 
> > Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming? Do you use broadcast variables or are the errors
> > coming from broadcast joins perhaps?
> > 
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://about.me/JacekLaskowski
> > "The Internals Of" Online Books <https://books.japila.pl/>
> > Follow me on https://twitter.com/jaceklaskowski
> > 
> > <https://twitter.com/jaceklaskowski>
> > 
> > 
> > On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <harshsharma21189@gmail.com>
> > wrote:
> > 
> > > We are facing issue in production where we are getting frequent
> > >
> > > Still have 1 request outstanding when connection with the hostname was
> > > closed
> > >
> > > connection reset by peer : errors as well as warnings  : failed to remove
> > > cache rdd or failed  to remove broadcast variable.
> > >
> > > Please help us how to mitigate this  :
> > >
> > > Executor memory : 12g
> > >
> > > Network timeout :   600000
> > >
> > > Heartbeat interval : 250000
> > >
> > >
> > >
> > > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (1
+ 3)
> > > / 200]
> > > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (2
+ 3)
> > > / 200]
> > > [Stage 292:>                                                      (2 + 4)
> > > / 200][14/06/21 10:46:17,006 WARN
> > > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > > <hostname>
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > >         at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > >         at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > >         at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > >         at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > >         at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > >         at java.lang.Thread.run(Thread.java:748)
> > > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > > Still have 1 requests outstanding when connection from <hostname> is
closed
> > > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > > cleaning broadcast 159
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > >         at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > >         at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > >         at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > >         at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > >         at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > >         at java.lang.Thread.run(Thread.java:748)
> > > [14/06/21 10:46:17,012 WARN
> > > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > >         at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > >         at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > >         at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > >         at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > >         at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> > >
> > >
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message