spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Anand <abhis.anan...@gmail.com>
Subject Running glm in sparkR (data pre-processing step)
Date Mon, 30 May 2016 09:06:17 GMT
Hi ,

I want to run glm variant of sparkR for my data that is there in a csv file.

I see that the glm function in sparkR takes a spark dataframe as input.

Now, when I read a file from csv and create a spark dataframe, how could I
take care of the factor variables/columns in my data ?

Do I need to convert it to a R dataframe, convert to factor using as.factor
and create spark dataframe and run glm over it ?

But, running as.factor over big dataset is not possible.

Please suggest what is the best way to acheive this ?

What pre-processing should be done, and what is the best way to achieve it
 ?


Thanks,
Abhi

Mime
View raw message