spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey <>
Subject strange behavior of pyspark RDD zip
Date Fri, 01 Apr 2016 18:08:06 GMT

I'm on Spark 1.6.1 in local mode on Windows.

And have issue with zip of zip'pping of two RDDs of __equal__ size and
__equal__ partitions number (I also tried to repartition both RDDs to one
I get such exception when I do

File "c:\spark\python\lib\\pyspark\", line 111, in main
  File "c:\spark\python\lib\\pyspark\", line 106, in process
  File "c:\spark\python\lib\\pyspark\", line
263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "c:\spark\python\pyspark\", line 95, in func
    for obj in iterator:
  File "c:\spark\python\lib\\pyspark\", line
322, in load_stream
    " in pair: (%d, %d)" % (len(keys), len(vals)))
ValueError: Can not deserialize RDD with different number of items in
pair: (256, 512)

View raw message