spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Ganelin <ilgan...@gmail.com>
Subject Re: SaveAsTextFile brings down data nodes with IO Exceptions
Date Sat, 16 May 2015 07:40:37 GMT
All - this issue showed up when I was tearing down a spark context and
creating a new one. Often, I was unable to then write to HDFS due to this
error. I subsequently switched to a different implementation where instead
of tearing down and re initializing the spark context I'd instead submit a
separate request to YARN.
On Fri, May 15, 2015 at 2:35 PM Puneet Kapoor <puneet.cse.iitd@gmail.com>
wrote:

> I am seeing this on hadoop 2.4.0 version.
>
> Thanks for your suggestions, i will try those and let you know if they
> help !
>
> On Sat, May 16, 2015 at 1:57 AM, Steve Loughran <stevel@hortonworks.com>
> wrote:
>
>>  What version of Hadoop are you seeing this on?
>>
>>
>>  On 15 May 2015, at 20:03, Puneet Kapoor <puneet.cse.iitd@gmail.com>
>> wrote:
>>
>>  Hey,
>>
>>  Did you find any solution for this issue, we are seeing similar logs in
>> our Data node logs. Appreciate any help.
>>
>>
>>
>>
>>
>>  2015-05-15 10:51:43,615 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> NttUpgradeDN1:50010:DataXceiver error processing WRITE_BLOCK operation
>>  src: /192.168.112.190:46253 dst: /192.168.151.104:50010
>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.151.104:50010
>> remote=/192.168.112.190:46253]
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>>         at java.io.BufferedInputStream.fill(Unknown Source)
>>         at java.io.BufferedInputStream.read1(Unknown Source)
>>         at java.io.BufferedInputStream.read(Unknown Source)
>>         at java.io.DataInputStream.read(Unknown Source)
>>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:742)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
>>         at
>> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>>         at java.lang.Thread.run(Unknown Source)
>>
>>
>>  That's being logged @ error level in DN. It doesn't mean the DN has
>> crashed, only that it timed out waiting for data: something has gone wrong
>> elsewhere.
>>
>>  https://issues.apache.org/jira/browse/HDFS-693
>>
>>
>> there's a couple of properties you can do to extend timeouts
>>
>>   <property>
>>
>>         <name>dfs.socket.timeout</name>
>>
>>         <value>20000</value>
>>
>>     </property>
>>
>>
>>     <property>
>>
>>         <name>dfs.datanode.socket.write.timeout</name>
>>
>>         <value>20000</value>
>>
>>     </property>
>>
>>
>>
>> You can also increase the number of data node tranceiver threads to
>> handle data IO across the network
>>
>>
>> <property>
>> <name>dfs.datanode.max.xcievers</name>
>> <value>4096</value>
>> </property>
>>
>> Yes, that property has that explicit spellinng, it's easy to get wrong
>>
>>
>

Mime
View raw message