spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan H├ęda <ivan.h...@gmail.com>
Subject ExecutorLostFailure when working with RDDs
Date Fri, 09 Oct 2015 13:13:47 GMT
Hi,

I'm facing an issue with PySpark (1.5.1, 1.6.0-SNAPSHOT) running over Yarn
(2.6.0-cdh5.4.4). Everything seems fine when working with dataframes, but
when i need RDD the workers start to fail. Like in the next code

table1 = sqlContext.table('someTable')
table1.count() ## OK ## cca 500 millions rows

table1.groupBy(table1.field).count().show() ## no problem

table1.rdd.count() ## fails with above log from driver

# Py4JJavaError: An error occurredwhile calling
z:org.apache.spark.api.python.PythonRDD.collectAndServe.

# : org.apache.spark.SparkException: Job aborted due to stage failure:
Task 23 in stage 117.0 failed 4 times, most recent failure: Lost task
23.3 in stage 117.0 (TID 23836, some_host): ExecutorLostFailure
(executor 2446 lost)

The particular workers fail with this log

15/10/09 14:56:59 WARN TransportChannelHandler: Exception in
connection from host/ip:port
java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
	at sun.nio.ch.IOUtil.read(IOUtil.java:192)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
	at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)


RDD is working as expected if I use
conf.set("spark.shuffle.blockTransferService", "nio").

Since "nio" is deprecated I'm looking for better solution. Any ideas?

Thanks in advance

ih

Mime
View raw message