spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Distributed dictionary building
Date Sat, 20 Sep 2014 19:10:25 GMT
Hi,

I am building a dictionary of RDD[(String, Long)] and after the dictionary
is built and cached, I find key "almonds" at value 5187 using:

rdd.filter{case(product, index) => product == "almonds"}.collect

Output:

Debug product almonds index 5187
Now I take the same dictionary and write it out as:

dictionary.map{case(product, index) => product + "," + index}
.saveAsTextFile(outputPath)

Inside the map I also print what's the product at index 5187 and I get a
different product:

Debug Index 5187 userOrProduct cardigans

Is this an expected behavior from map ?

By the way "almonds" and "apparel-cardigans" are just one off in the
index...

I am using spark-1.1 but it's a snapshot..

Thanks.
Deb

Mime
View raw message