spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Erlandson <eerla...@redhat.com>
Subject UDAFs for sketching Dataset columns with T-Digests
Date Thu, 06 Jul 2017 00:33:28 GMT
After my talk on T-Digests in Spark at Spark Summit East, there were some
requests for a UDAF-based interface for working with Datasets.   I'm
pleased to announce that I released a library for doing T-Digest sketching
with UDAFs:

https://github.com/isarn/isarn-sketches-spark

This initial release provides support for Scala. Future releases will
support PySpark bindings, and additional tools for leveraging T-Digests in
ML pipelines.

Cheers!
Erik

Mime
View raw message