spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florian M <>
Subject [SparkR] How to perform a for loop on a DataFrame object
Date Thu, 20 Aug 2015 10:10:11 GMT
Hi guys, 

First of all, thank you for your amazing work.

As you can see in the subject, I post here because I need to perform a for
loop on a DataFrame object. 

Sample of my Dataset (the entire dataset is ~400k lines long) : 

I use the 1.4.1 Spark version with R in 3.2.1

I launch sparkR using (the package can be found at )

I load my dataset from HDFS using the following command (the package is
needed to load a CSV in a Spark DataFrame): 

When I do a summary, the output is : 

What I need to do is to calculate :

But you probably know that we can't do this because the read.df function
return an S4 object and it is not an iterable object.

Does anyone know how can I do that ? 
Maybe I have to convert the type of the DataFrame or use another function to
load my dataset...
I have to say that I'm new to Spark and SparkR :)

Thanks for your time,


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message