spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5385) Calling textFile, parallelize, zip, then partitions causes failure on some local[*]
Date Fri, 23 Jan 2015 18:00:43 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289610#comment-14289610
] 

Sean Owen commented on SPARK-5385:
----------------------------------

The number of partitions is not the problem, although several methods (including coalesce
and repartition) let you change the number of partitions. The issue is that the partitions
do not all have the same number of elements as each other.

> Calling textFile, parallelize, zip, then partitions causes failure on some local[*]
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-5385
>                 URL: https://issues.apache.org/jira/browse/SPARK-5385
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Pedro Rodriguez
>
> There is a bug in Spark core which produces the exception: "Can't zip RDDs with unequal
numbers of partitions"
> General Steps to reproduce:
> 1. Run sc.textFiles
> 2. Run sc.parallelize
> 3. Zip results of top two
> 4. Call partitions on result of zip
> 5. Run for local, local[2], local[3],...
> 6. My machine (macbook air) fails on local[3].
> Github repository with code example: https://github.com/EntilZha/spark-zip-bug
> Steps to run: execute "sbt run", wait for failure
> Stack trace:
> java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions
> 	at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:57)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> 	at App$.main(App.scala:33)
> 	at App.main(App.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:483)
> I am looking into the relevant classes, but insight would be appreciated. This ticket
may also be related to https://issues.apache.org/jira/browse/SPARK-2823 and https://issues.apache.org/jira/browse/SPARK-5351



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message