spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: repartition combined with zipWithIndex get stuck
Date Sun, 16 Nov 2014 04:38:31 GMT
I think I understand where the bug is now. I created a JIRA
(https://issues.apache.org/jira/browse/SPARK-4433) and will make a PR
soon. -Xiangrui

On Sat, Nov 15, 2014 at 7:39 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
> This is a bug. Could you make a JIRA? -Xiangrui
>
> On Sat, Nov 15, 2014 at 3:27 AM, lev <katzav@gmail.com> wrote:
>> Hi,
>>
>> I'm having trouble using both zipWithIndex and repartition. When I use them
>> both, the following action will get stuck and won't return.
>> I'm using spark 1.1.0.
>>
>>
>> Those 2 lines work as expected:
>>
>> scala> sc.parallelize(1 to 10).repartition(10).count()
>> res0: Long = 10
>>
>> scala> sc.parallelize(1 to 10).zipWithIndex.count()
>> res1: Long = 10
>>
>>
>> But this statement get stuck and doesn't return:
>>
>> scala> sc.parallelize(1 to 10).zipWithIndex.repartition(10).count()
>> 14/11/15 03:18:55 INFO spark.SparkContext: Starting job: apply at
>> Option.scala:120
>> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Got job 3 (apply at
>> Option.scala:120) with 3 output partitions (allowLocal=false)
>> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Final stage: Stage 4(apply at
>> Option.scala:120)
>> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Parents of final stage:
>> List()
>> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Missing parents: List()
>> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Submitting Stage 4
>> (ParallelCollectionRDD[7] at parallelize at <console>:13), which has no
>> missing parents
>> 14/11/15 03:18:55 INFO storage.MemoryStore: ensureFreeSpace(1096) called
>> with curMem=7616, maxMem=138938941
>> 14/11/15 03:18:55 INFO storage.MemoryStore: Block broadcast_4 stored as
>> values in memory (estimated size 1096.0 B, free 132.5 MB)
>>
>>
>> Am I doing something wrong here or is it a bug?
>> Is there some work around?
>>
>> Thanks,
>> Lev.
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/repartition-combined-with-zipWithIndex-get-stuck-tp18999.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message