The problem I am looking at is as follows: 

- I read in a log file of multiple users as a RDD

- I'd like to group the above RDD into multiple RDDs by userIds (the key)

- my processEachUser() function then takes in each RDD mapped into each individual user, and calls for RDD.map or DataFrame operations on them. (I already had the function coded, I am therefore reluctant to work with the ResultIterable object coming out of rdd.groupByKey() ... ) 

I've searched the mailing list and googled on "RDD of RDDs" and seems like it isn't a thing at all. 

A few choices left seem to be: 1) groupByKey() and then work with the ResultIterable object; 2) groupbyKey() and then write each group into a file, and read them back as individual rdds to process.. 

Anyone got a better idea or had a similar problem before? 


Ping Yan
Ph.D. in Management
Dept. of Management Information Systems
University of Arizona
Tucson, AZ 85721