spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "R. Tyler Croy" <rty...@brokenco.de>
Subject Object serialization for workers
Date Sun, 19 May 2019 16:27:13 GMT

Greetings! I am looking into the possibility of JRuby support for Spark, and
could use some pointers (references?) to orient myself a bit better within the
codebase.

JRuby fat jars load just fine in Spark but where things start to get
predictably dicey is with object serialization for RDDs getting sent to the
workers.

Having worked on something similar for Apache Storm
(https://github.com/jruby-gradle/redstorm), what we ended up doing was shimming
some classes to handy Ruby object/class serialization properly.

I'm expecting to do something similar in Spark but I'm not entirely sure which
interfaces/classes describe the serialization of RDDs. I'm figuring that I'll
need to implement a Ruby equivalent of the org.apache.spark.api.java.function
namespaces, but am not entirely where the pieces come together to turn those
into serialized objects.


Appreciate any direction you all might be able to share, in the meantime, I've
got my miner's cap on and am presently digging through core/ :)



Cheers

--
GitHub:  https://github.com/rtyler

GPG Key ID: 0F2298A980EE31ACCA0A7825E5C92681BEF6CEA2

Mime
View raw message