spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shao, Saisai" <saisai.s...@intel.com>
Subject RE: Issues with partitionBy: FetchFailed
Date Sun, 21 Sep 2014 11:54:39 GMT
Hi,

I’ve also met this problem before, I think you can try to set “spark.core.connection.ack.wait.timeout”
to a large value to avoid ack timeout, default is 60 seconds.

Sometimes because of GC pause or some other reasons, acknowledged message will be timeout,
which will lead to this exception, you can try setting a large value of this configuration.

Thanks
Jerry

From: Julien Carme [mailto:julien.carme@gmail.com]
Sent: Sunday, September 21, 2014 7:43 PM
To: user@spark.apache.org
Subject: Issues with partitionBy: FetchFailed

Hello,
I am facing an issue with partitionBy, it is not clear whether it is a problem with my code
or with my spark setup. I am using Spark 1.1, standalone, and my other spark projects work
fine.
So I have to repartition a relatively large file (about 70 million lines). Here is a minimal
version of what is not working fine:
myRDD = sc.textFile("...").map { line => (extractKey(line),line) }
myRepartitionedRDD = myRDD.partitionBy(new HashPartitioner(100))
myRepartitionedRDD.saveAsTextFile(...)
It runs quite some time, until I get some errors and it retries. Errors are:
FetchFailed(BlockManagerId(3,myWorker2, 52082,0), shuffleId=1,mapId=1,reduceId=5)
Logs are not much more infomrative. I get:

Java.io.IOException : sendMessageReliability failed because ack was not received within 60
sec

I get similar errors with all my workers.
Do you have some kind of explanation for this behaviour? What could be wrong?
Thanks,


Mime
View raw message