spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wisc Forum <>
Subject Spark map function question
Date Tue, 22 Oct 2013 04:18:49 GMT
Hi, we have tried integrating Spark with our existing code and see some issues.

The issue is that when we use the below function (where func is a function to process elem){ elem => {func.apply(elem)} }

in the log, I see the apply function are applied a few times for the same element elem instead
of one.

When I execute this in a sequential way (see below), everything works just fine.

sparkContext.parallelize( => proj.apply(elem)))

(the only reason I used sparkContext.parallelize) in the above line is because the method
requires returning RDD[MyDataType]

Why this happens? Does the map function requires some special thing for the rdd?

View raw message