spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5385) Calling textFile, parallelize, zip, then partitions causes failure on some local[*]
Date Fri, 23 Jan 2015 17:46:35 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289586#comment-14289586
] 

Sean Owen commented on SPARK-5385:
----------------------------------

Based on this description, this does not sound like a bug. The error means what it says: you
can't zip RDDs unless each partition has an equal number of elements. I don't see a reason
to expect that an arbitrary call to textFile and parallelize meets this criterion. It may
happen to work, depending on what your data and program does, but you haven't specified it.

> Calling textFile, parallelize, zip, then partitions causes failure on some local[*]
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-5385
>                 URL: https://issues.apache.org/jira/browse/SPARK-5385
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Pedro Rodriguez
>
> There is a bug in Spark core which produces the exception: "Can't zip RDDs with unequal
numbers of partitions"
> General Steps to reproduce:
> 1. Run sc.textFiles
> 2. Run sc.parallelize
> 3. Zip results of top two
> 4. Call partitions on result of zip
> 5. Run for local, local[2], local[3],...
> 6. My machine (macbook air) fails on local[3].
> Github repository with code example: https://github.com/EntilZha/spark-zip-bug
> Steps to run: execute "sbt run", wait for failure
> Stack trace:
> java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions
> 	at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:57)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> 	at App$.main(App.scala:33)
> 	at App.main(App.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:483)
> I am looking into the relevant classes, but insight would be appreciated. This ticket
may also be related to https://issues.apache.org/jira/browse/SPARK-2823 and https://issues.apache.org/jira/browse/SPARK-5351



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message