I used the word "streaming" but I did not mean to refer to spark streaming. I meant if a partition containing 10 objects was kryo-serialized into a single buffer, then in a mapPartitions() call, as I call iter.next() 10 times to access these objects one at a time, does the deserialization happen
a) once to get all 10 objects,
b) 10 times "incrementally" to get an object at a time, or
c) 10 times to get 10 objects and discard the "wrong" 9 objects [ i doubt this would a design anyone would have adopted ]
I think your answer is option (a) and you refered to Spark streaming to indicate that there is no difference in its behavior from spark core...right?
If it is indeed option (a), I am happy with it and don't need to customize. If it is (b), I would like to have (a) instead.
I am also wondering if kryo is good at compression of strings and numbers. Often I have the data type as "Double" but it could be encoded in much fewer bits.