spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suman Somasundar <suman.somasun...@oracle.com>
Subject Connection closed error while running Terasort
Date Tue, 01 Sep 2015 00:13:14 GMT
Hi,

 

I am getting the following error while trying to run a 10GB terasort under Yarn with 8 nodes.

The command is:  

spark-submit --class com.github.ehiggs.spark.terasort.TeraSort --master yarn-cluster --num-executors
10 --executor-memory 32g spark-terasort-master/target/spark-terasort-1.0-SNAPSHOT-jar-with-dependencies.jar
hdfs://hadoop-solaris-a:8020/user/hadoop/terasort/input-10 hdfs://hadoop-solaris-a:8020/user/hadoop/terasort/output-10

 

What might be causing this error?

 

15/08/31 17:09:48 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019052,
chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/3e/shuffle_1_9_0.data,
offset=0, length=1059423784}} to /199.199.35.5:52486; closing connection

java.io.IOException: Broken pipe

        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)

        at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443)

        at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575)

        at org.apache.spark.network.buffer.LazyFileRegion.transferTo(LazyFileRegion.java:96)

        at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:89)

        at io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:237)

        at io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:233)

        at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:264)

        at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:707)

        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.forceFlush(AbstractNioChannel.java:321)

        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:519)

        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)

        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)

        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

        at java.lang.Thread.run(Thread.java:745)

15/08/31 17:10:48 ERROR server.TransportChannelHandler: Connection to hadoop-solaris-c/199.199.35.4:48540
has been quiet for 120000 ms while there are outstanding requests. Assuming connection is
dead; please adjust spark.network.timeout if this is wrong.

15/08/31 17:10:48 ERROR client.TransportResponseHandler: Still have 1 requests outstanding
when connection from hadoop-solaris-c/199.199.35.4:48540 is closed

15/08/31 17:10:48 INFO shuffle.RetryingBlockFetcher: Retrying fetch (3/3) for 1 outstanding
blocks after 5000 ms

15/08/31 17:10:49 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019053,
chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/1b/shuffle_1_6_0.data,
offset=0, length=1052128440}} to /199.199.35.6:45201; closing connection

java.nio.channels.ClosedChannelException

15/08/31 17:10:53 INFO client.TransportClientFactory: Found inactive connection to hadoop-solaris-c/199.199.35.4:48540,
creating a new one.

15/08/31 17:11:31 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019054,
chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/1b/shuffle_1_6_0.data,
offset=0, length=1052128440}} to /199.199.35.10:55082; closing connection

java.nio.channels.ClosedChannelException

15/08/31 17:11:31 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019055,
chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/3e/shuffle_1_9_0.data,
offset=0, length=1059423784}} to /199.199.35.7:54328; closing connection

java.nio.channels.ClosedChannelException

15/08/31 17:11:53 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019056,
chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/3e/shuffle_1_9_0.data,
offset=0, length=1059423784}} to /199.199.35.5:50573; closing connection

java.nio.channels.ClosedChannelException

15/08/31 17:12:54 ERROR server.TransportChannelHandler: Connection to hadoop-solaris-c/199.199.35.4:48540
has been quiet for 120000 ms while there are outstanding requests. Assuming connection is
dead; please adjust spark.network.timeout if this is wrong.

15/08/31 17:12:54 ERROR client.TransportResponseHandler: Still have 1 requests outstanding
when connection from hadoop-solaris-c/199.199.35.4:48540 is closed

15/08/31 17:12:54 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block shuffle_1_7_7,
and will not retry (3 retries)

java.io.IOException: Connection from hadoop-solaris-c/199.199.35.4:48540 closed

        at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:104)

        at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:91)

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)

        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)

        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)

        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)

        at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:738)

        at io.netty.channel.AbstractChannel$AbstractUnsafe$6.run(AbstractChannel.java:606)

        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)

        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

        at java.lang.Thread.run(Thread.java:745)

15/08/31 17:12:54 ERROR storage.ShuffleBlockFetcherIterator: Failed to get block(s) from hadoop-solaris-c:48540

java.io.IOException: Connection from hadoop-solaris-c/199.199.35.4:48540 closed

        at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:104)

        at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:91)

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)

        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)

        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)

        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:183)

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:169)

        at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:738)

        at io.netty.channel.AbstractChannel$AbstractUnsafe$6.run(AbstractChannel.java:606)

        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)

        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

        at java.lang.Thread.run(Thread.java:745)

15/08/31 17:12:54 ERROR server.TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1867783019057,
chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/hadoop/nm-local-dir/usercache/hadoop/appcache/application_1441064487503_0001/blockmgr-c3c8dbb3-9ae2-4e45-b537-fd0beeff98b5/1b/shuffle_1_6_0.data,
offset=0, length=1052128440}} to /199.199.35.6:45044; closing connection

java.nio.channels.ClosedChannelException

 

 

Thanks,
Suman.

Mime
View raw message