spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Erlandson <eerla...@redhat.com>
Subject UDAFs have an inefficiency problem
Date Wed, 27 Mar 2019 23:19:21 GMT
I describe some of the details here:
https://issues.apache.org/jira/browse/SPARK-27296

The short version of the story is that aggregating data structures (UDTs)
used by UDAFs are serialized to a Row object, and de-serialized, for every
row in a data frame.
Cheers,
Erik

Mime
View raw message