spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiří Syrový <syrovy.j...@gmail.com>
Subject Re: Can't zip RDDs with unequal numbers of partitions
Date Fri, 18 Mar 2016 08:55:06 GMT
Unfortunately I can't share any snippet quickly as the code is generated,
but for now at least can share the plan. (See it here -
http://pastebin.dqd.cz/RAhm/)

After I've increased spark.sql.autoBroadcastJoinThreshold to 300000 from
100000 it went through without any problems. With 100000 it was always
failing during the "planning" phase with the Exception above.

2016-03-17 22:05 GMT+01:00 Jakob Odersky <jakob@odersky.com>:

> Can you share a snippet that reproduces the error? What was
> spark.sql.autoBroadcastJoinThreshold before your last change?
>
> On Thu, Mar 17, 2016 at 10:03 AM, Jiří Syrový <syrovy.jiri@gmail.com>
> wrote:
> > Hi,
> >
> > any idea what could be causing this issue? It started appearing after
> > changing parameter
> >
> >     spark.sql.autoBroadcastJoinThreshold to 100000
> >
> >
> > Caused by: java.lang.IllegalArgumentException: Can't zip RDDs with
> unequal
> > numbers of partitions
> >         at
> >
> org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:57)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> >         at scala.Option.getOrElse(Option.scala:120)
> >         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> >         at
> > org.apache.spark.rdd.PartitionCoalescer.<init>(CoalescedRDD.scala:172)
> >         at
> > org.apache.spark.rdd.CoalescedRDD.getPartitions(CoalescedRDD.scala:85)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> >         at scala.Option.getOrElse(Option.scala:120)
> >         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> >         at
> >
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> >         at scala.Option.getOrElse(Option.scala:120)
> >         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> >         at
> >
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> >         at scala.Option.getOrElse(Option.scala:120)
> >         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> >         at
> >
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
> >         at
> > org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
> >         at scala.Option.getOrElse(Option.scala:120)
> >         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> >         at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
> >         at
> >
> org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
> >         at
> >
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
> >         at
> >
> org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
> >         at
> >
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
> >         ... 28 more
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message