spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Shtelma <>
Subject Compiling Spark UDF at runtime
Date Fri, 12 Jan 2018 11:57:42 GMT
Hi all,

I would like to be able to compile Spark UDF at runtime. Right now I
am using Janino for that.
My problem is, that in order to make my compiled functions visible to
spark, I have to set janino classloader (janino gives me classloader
with compiled UDF classes) as context class loader before I create
Spark Session. This approach is working locally for debugging purposes
but is not going to work in cluster mode, because the UDF classes will
not be distributed to the worker nodes.

An alternative is to register UDF via Hive functionality and generate
temporary jar somewhere, which at least in Standalone cluster mode
will be made available to spark workers using embedded http server. As
far as I understand, this is not going to work in yarn mode.

I am wondering now, how is it better to approach this problem? My
current best idea is to develop own small netty based file web server
and use it in order to distribute my custom jar, which can be created
on the fly, to workers both in standalone and in yarn modes. Can I
reference the jar in form  of http url using extra driver options and
then register UDFs contained in this jar using spark.udf().* methods?

Does anybody have any better ideas?
Any assistance would be greatly appreciated!


To unsubscribe e-mail:

View raw message