spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georg Heiler <georg.kf.hei...@gmail.com>
Subject Re: Compiling Spark UDF at runtime
Date Fri, 12 Jan 2018 14:26:12 GMT
You could store the jar in hdfs. Then even in yarn cluster mode your give
workaround should work.
Michael Shtelma <mshtelma@gmail.com> schrieb am Fr. 12. Jan. 2018 um 12:58:

> Hi all,
>
> I would like to be able to compile Spark UDF at runtime. Right now I
> am using Janino for that.
> My problem is, that in order to make my compiled functions visible to
> spark, I have to set janino classloader (janino gives me classloader
> with compiled UDF classes) as context class loader before I create
> Spark Session. This approach is working locally for debugging purposes
> but is not going to work in cluster mode, because the UDF classes will
> not be distributed to the worker nodes.
>
> An alternative is to register UDF via Hive functionality and generate
> temporary jar somewhere, which at least in Standalone cluster mode
> will be made available to spark workers using embedded http server. As
> far as I understand, this is not going to work in yarn mode.
>
> I am wondering now, how is it better to approach this problem? My
> current best idea is to develop own small netty based file web server
> and use it in order to distribute my custom jar, which can be created
> on the fly, to workers both in standalone and in yarn modes. Can I
> reference the jar in form  of http url using extra driver options and
> then register UDFs contained in this jar using spark.udf().* methods?
>
> Does anybody have any better ideas?
> Any assistance would be greatly appreciated!
>
> Thanks,
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message