You can use rank with window function. Rank=1 is same as calling first().

Not sure how you would randomly pick records though, if there is no Nth record. In your example, what happens if data is of only 2 rows?

Spark SQL has a "first" function that returns the first item in a group. Is there a similar function, perhaps in a third party lib, that allows you to return an arbitrary (e.g. 3rd) item from the group? Was thinking of writing a UDAF for it, but didn't want to reinvent the wheel. My endgoal is to be able to select a random item from the group, using random number generator.