spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anishm <>
Subject How to add all combinations of items rated by user and difference between the ratings?
Date Sat, 28 Mar 2015 12:09:46 GMT
The input file is of format: userid, movieid, rating
>From this plan, I want to extract all possible combinations of movies and
difference between the ratings for each user.

(movie1, movie2),(rating(movie1)-rating(movie2))

This process should be processed for each user in the dataset. Finally, I
would like to find the average disagreement of movies for the user.

(movie1, movie2), average difference between ratings

How do I do the same in python?

I did write a code for Hadoop Streaming, but having a real hard time
converting it to Spark compatible code.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message