spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashank Mandil <mandil.shash...@gmail.com>
Subject Spark data frame map problem
Date Tue, 21 Mar 2017 18:40:06 GMT
Hi All,

I have a spark data frame which has 992 rows inside it.
When I run a map on this data frame I expect that the map should work for
all the 992 rows.

As a mapper runs on an executor on  a cluster I did a distributed count of
the number of rows the mapper is being run on.

dataframe.map(r => {
   //distributed count inside here using zookeeper
})

I have found that this distributed count inside the mapper is not exactly
992. I have found this number to vary with different runs.

Does anybody have any idea what might be happening ? By the way, I am using
spark 1.6.1

Thanks,
Shashank

Mime
View raw message