spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pralabh Kumar <pralabhku...@gmail.com>
Subject Unable to pickle pySpark PipelineModel
Date Fri, 11 Dec 2020 06:11:32 GMT
Hi Dev , User

I want to store spark ml model in databases , so that I can reuse them
later on .  I am
unable to pickle them . However while using scala I am able to convert them
into byte
array stream .

So for .eg I am able to do something below in scala but not in python

 val modelToByteArray = new ByteArrayOutputStream()
 val oos = new ObjectOutputStream(modelToByteArray)
 oos.writeObject(model)
 oos.close()
 oos.flush()

spark.sparkContext.parallelize(Seq((model.uid, "my-neural-network-model",
modelToByteArray.toByteArray)))
   .saveToCassandra("dfsdfs", "models", SomeColumns("uid", "name", "model")


But pickle.dumps(model) in pyspark throws error

cannot pickle '_thread.RLock' object


Please help on the same


Regards

Pralabh

Mime
View raw message