spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Cancelled Key Exceptions on Massive Join
Date Sun, 16 Nov 2014 17:56:11 GMT
This usually happens when one of the worker is stuck on GC Pause and it
times out. Enable the following configurations while creating sparkContext:

 sc.set("spark.rdd.compress","true")

      sc.set("spark.storage.memoryFraction","1")
      sc.set("spark.core.connection.ack.wait.timeout","6000")
      sc.set("spark.akka.frameSize","100")



Thanks
Best Regards

On Sat, Nov 15, 2014 at 12:46 AM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com
> wrote:

> Hello all. I have been running a Spark Job that eventually needs to do a
> large join.
>
> 24 million x 150 million
>
> A broadcast join is infeasible in this instance clearly, so I am instead
> attempting to do it with Hash Partitioning by defining a custom partitioner
> as:
>
>
> class RDD2Partitioner(partitions: Int) extends HashPartitioner(partitions) {
>
>   override def getPartition(key: Any): Int = key match {
>     case k: Tuple2[Int, String] => super.getPartition(k._1)
>     case _ => super.getPartition(key)
>   }
>
> }
>
> I then partition both arrays using this partitioner. However, the job eventually fails
with the following exception which if I had to guess indicated that a network connection was
interrupted during the shuffle stage, causing things to get lost and ultimately resulting
in a  fetch failure:
>
> 14/11/14 12:56:21 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(innovationdatanode08.cof.ds.capitalone.com,37590)
> 14/11/14 12:56:21 INFO ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@7369b398
> 14/11/14 12:56:21 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@7369b398
> java.nio.channels.CancelledKeyException
> 	at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
> 	at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
>
>
> In the spark UI, I still see a substantial amount of shuffling going on at this stage,
I am wondering if I’m perhaps using the partitioner incorrectly?
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed.  If the reader of this message is not the
> intended recipient, you are hereby notified that any review,
> retransmission, dissemination, distribution, copying or other use of, or
> taking of any action in reliance upon this information is strictly
> prohibited. If you have received this communication in error, please
> contact the sender and delete the material from your computer.
>

Mime
View raw message