spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adelbert Chang <adelbe...@gmail.com>
Subject Re: Getting around Serializability issues for types not in my control
Date Mon, 23 Mar 2015 20:33:54 GMT
Instantiating the instance? The actual instance it's complaining about is:

https://github.com/scalaz/scalaz/blob/16838556c9309225013f917e577072476f46dc14/core/src/main/scala/scalaz/std/Option.scala#L10-11

The specific import where it's picking up the instance is:

https://github.com/scalaz/scalaz/blob/16838556c9309225013f917e577072476f46dc14/core/src/main/scala/scalaz/std/Option.scala#L227


Note the object extends OptionInstances which contains that instance.

Is the suggestion to pass in something like new OptionInstances { } into
the RDD#aggregate call?

On Mon, Mar 23, 2015 at 1:09 PM, Cody Koeninger <cody@koeninger.org> wrote:

> Have you tried instantiating the instance inside the closure, rather than
> outside of it?
>
> If that works, you may need to switch to use mapPartition /
> foreachPartition for efficiency reasons.
>
>
> On Mon, Mar 23, 2015 at 3:03 PM, Adelbert Chang <adelbertc@gmail.com>
> wrote:
>
>> Is there no way to pull out the bits of the instance I want before I sent
>> it through the closure for aggregate? I did try pulling things out, along
>> the lines of
>>
>> def foo[G[_], B](blah: Blah)(implicit G: Applicative[G]) = {
>>   val lift: B => G[RDD[B]] = b =>
>> G.point(sparkContext.parallelize(List(b)))
>>
>>   rdd.aggregate(/* use lift in here */)
>> }
>>
>> But that doesn't seem to work either, still seems to be trying to
>> serialize the Applicative... :(
>>
>> On Mon, Mar 23, 2015 at 12:27 PM, Dean Wampler <deanwampler@gmail.com>
>> wrote:
>>
>>> Well, it's complaining about trait OptionInstances which is defined in
>>> Option.scala in the std package. Use scalap or javap on the scalaz library
>>> to find out which member of the trait is the problem, but since it says
>>> "$$anon$1", I suspect it's the first value member, "implicit val
>>> optionInstance", which has a long list of mixin traits, one of which is
>>> probably at fault. OptionInstances is huge, so there might be other
>>> offenders.
>>>
>>> Scalaz wasn't designed for distributed systems like this, so you'll
>>> probably find many examples of nonserializability. An alternative is to
>>> avoid using Scalaz in any closures passed to Spark methods, but that's
>>> probably not what you want.
>>>
>>> dean
>>>
>>> Dean Wampler, Ph.D.
>>> Author: Programming Scala, 2nd Edition
>>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>>> Typesafe <http://typesafe.com>
>>> @deanwampler <http://twitter.com/deanwampler>
>>> http://polyglotprogramming.com
>>>
>>> On Mon, Mar 23, 2015 at 12:03 PM, adelbertc <adelbertc@gmail.com> wrote:
>>>
>>>> Hey all,
>>>>
>>>> I'd like to use the Scalaz library in some of my Spark jobs, but am
>>>> running
>>>> into issues where some stuff I use from Scalaz is not serializable. For
>>>> instance, in Scalaz there is a trait
>>>>
>>>> /** In Scalaz */
>>>> trait Applicative[F[_]] {
>>>>   def apply2[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C]
>>>>   def point[A](a: => A): F[A]
>>>> }
>>>>
>>>> But when I try to use it in say, in an `RDD#aggregate` call I get:
>>>>
>>>>
>>>> Caused by: java.io.NotSerializableException:
>>>> scalaz.std.OptionInstances$$anon$1
>>>> Serialization stack:
>>>>         - object not serializable (class:
>>>> scalaz.std.OptionInstances$$anon$1,
>>>> value: scalaz.std.OptionInstances$$anon$1@4516ee8c)
>>>>         - field (class: dielectric.syntax.RDDOps$$anonfun$1, name: G$1,
>>>> type:
>>>> interface scalaz.Applicative)
>>>>         - object (class dielectric.syntax.RDDOps$$anonfun$1,
>>>> <function2>)
>>>>         - field (class:
>>>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>>>> name: apConcat$1, type: interface scala.Function2)
>>>>         - object (class
>>>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>>>> <function2>)
>>>>
>>>> Outside of submitting a PR to Scalaz to make things Serializable, what
>>>> can I
>>>> do to make things Serializable? I considered something like
>>>>
>>>> implicit def applicativeSerializable[F[_]](implicit F: Applicative[F]):
>>>> SomeSerializableType[F] =
>>>>   new SomeSerializableType { ... } ??
>>>>
>>>> Not sure how to go about doing it - I looked at java.io.Externalizable
>>>> but
>>>> given `scalaz.Applicative` has no value members I'm not sure how to
>>>> implement the interface.
>>>>
>>>> Any guidance would be much appreciated - thanks!
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-around-Serializability-issues-for-types-not-in-my-control-tp22193.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>> Adelbert (Allen) Chang
>>
>
>


-- 
Adelbert (Allen) Chang

Mime
View raw message