spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashank Mandil <>
Subject Spark data frame map problem
Date Tue, 21 Mar 2017 18:40:06 GMT
Hi All,

I have a spark data frame which has 992 rows inside it.
When I run a map on this data frame I expect that the map should work for
all the 992 rows.

As a mapper runs on an executor on  a cluster I did a distributed count of
the number of rows the mapper is being run on. => {
   //distributed count inside here using zookeeper

I have found that this distributed count inside the mapper is not exactly
992. I have found this number to vary with different runs.

Does anybody have any idea what might be happening ? By the way, I am using
spark 1.6.1


View raw message