spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: strange behavior of pyspark RDD zip
Date Mon, 11 Apr 2016 17:39:31 GMT
It seems like a bug, could you file a JIRA for this?
(also post a way to reproduce it)


On Fri, Apr 1, 2016 at 11:08 AM, Sergey <sergun@gmail.com> wrote:
> Hi!
>
> I'm on Spark 1.6.1 in local mode on Windows.
>
> And have issue with zip of zip'pping of two RDDs of __equal__ size and
> __equal__ partitions number (I also tried to repartition both RDDs to one
> partition).
> I get such exception when I do rdd1.zip(rdd2).count():
>
> File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 111, in main
>   File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 106, in
> process
>   File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 263,
> in dump_stream
>     vs = list(itertools.islice(iterator, batch))
>   File "c:\spark\python\pyspark\rddsampler.py", line 95, in func
>     for obj in iterator:
>   File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 322,
> in load_stream
>     " in pair: (%d, %d)" % (len(keys), len(vals)))
> ValueError: Can not deserialize RDD with different number of items in pair:
> (256, 512)
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message