spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Melo <>
Subject Exposing functions to pyspark
Date Mon, 30 Sep 2019 20:48:03 GMT

I'm working on a DSv2 implementation with a userbase that is 100% pyspark based.

There's some interesting additional DS-level functionality I'd like to
expose from the Java side to pyspark -- e.g. I/O metrics, which source
site provided the data, etc...

Does someone have an example of how to expose that to pyspark? We
provide a python library for scientists to use, so I can also provide
the python half, I just don't know where to begin. Part of the mental
issue I'm having is that when a user does the following in pyspark:

df ='edu.vanderbilt.accre.laurelin.Root') \
                .option("tree", "tree") \

They don't have a reference to any of my DS objects -- "df" is a
DataFrame object, which I don't own.

Does anyone have a tip?

To unsubscribe e-mail:

View raw message