Transformers in Spark ML typically operate on a per-row basis, based on callUDF. For a new transformer that I'm developing, I have a need to transform an entire partition with a function, as opposed to transforming each row separately.   The reason is that, in my case, rows must be transformed in batch for efficiency to amortize some overhead.   How may I accomplish this?

One option appears to be to invoke DataFrame::mapPartitions, yielding an RDD that is then converted back to a DataFrame.   Unsure about the viability or consequences of that.

Eron Wright