spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: SaveAsTextFile brings down data nodes with IO Exceptions
Date Fri, 15 May 2015 20:27:13 GMT
What version of Hadoop are you seeing this on?


On 15 May 2015, at 20:03, Puneet Kapoor <puneet.cse.iitd@gmail.com<mailto:puneet.cse.iitd@gmail.com>>
wrote:

Hey,

Did you find any solution for this issue, we are seeing similar logs in our Data node logs.
Appreciate any help.





2015-05-15 10:51:43,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: NttUpgradeDN1:50010:DataXceiver
error processing WRITE_BLOCK operation  src: /192.168.112.190:46253<http://192.168.112.190:46253/>
dst: /192.168.151.104:50010<http://192.168.151.104:50010/>
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready
for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.151.104:50010<http://192.168.151.104:50010/>
remote=/192.168.112.190:46253<http://192.168.112.190:46253/>]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at java.io.DataInputStream.read(Unknown Source)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:742)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
        at java.lang.Thread.run(Unknown Source)


That's being logged @ error level in DN. It doesn't mean the DN has crashed, only that it
timed out waiting for data: something has gone wrong elsewhere.


https://issues.apache.org/jira/browse/HDFS-693


there's a couple of properties you can do to extend timeouts

  <property>

        <name>dfs.socket.timeout</name>

        <value>20000</value>

    </property>


    <property>

        <name>dfs.datanode.socket.write.timeout</name>

        <value>20000</value>

    </property>


You can also increase the number of data node tranceiver threads to handle data IO across
the network


<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>

Yes, that property has that explicit spellinng, it's easy to get wrong


Mime
View raw message