spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adelbert Chang <adelbe...@gmail.com>
Subject Re: Getting around Serializability issues for types not in my control
Date Mon, 23 Mar 2015 20:03:22 GMT
Is there no way to pull out the bits of the instance I want before I sent
it through the closure for aggregate? I did try pulling things out, along
the lines of

def foo[G[_], B](blah: Blah)(implicit G: Applicative[G]) = {
  val lift: B => G[RDD[B]] = b => G.point(sparkContext.parallelize(List(b)))

  rdd.aggregate(/* use lift in here */)
}

But that doesn't seem to work either, still seems to be trying to serialize
the Applicative... :(

On Mon, Mar 23, 2015 at 12:27 PM, Dean Wampler <deanwampler@gmail.com>
wrote:

> Well, it's complaining about trait OptionInstances which is defined in
> Option.scala in the std package. Use scalap or javap on the scalaz library
> to find out which member of the trait is the problem, but since it says
> "$$anon$1", I suspect it's the first value member, "implicit val
> optionInstance", which has a long list of mixin traits, one of which is
> probably at fault. OptionInstances is huge, so there might be other
> offenders.
>
> Scalaz wasn't designed for distributed systems like this, so you'll
> probably find many examples of nonserializability. An alternative is to
> avoid using Scalaz in any closures passed to Spark methods, but that's
> probably not what you want.
>
> dean
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Mon, Mar 23, 2015 at 12:03 PM, adelbertc <adelbertc@gmail.com> wrote:
>
>> Hey all,
>>
>> I'd like to use the Scalaz library in some of my Spark jobs, but am
>> running
>> into issues where some stuff I use from Scalaz is not serializable. For
>> instance, in Scalaz there is a trait
>>
>> /** In Scalaz */
>> trait Applicative[F[_]] {
>>   def apply2[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C]
>>   def point[A](a: => A): F[A]
>> }
>>
>> But when I try to use it in say, in an `RDD#aggregate` call I get:
>>
>>
>> Caused by: java.io.NotSerializableException:
>> scalaz.std.OptionInstances$$anon$1
>> Serialization stack:
>>         - object not serializable (class:
>> scalaz.std.OptionInstances$$anon$1,
>> value: scalaz.std.OptionInstances$$anon$1@4516ee8c)
>>         - field (class: dielectric.syntax.RDDOps$$anonfun$1, name: G$1,
>> type:
>> interface scalaz.Applicative)
>>         - object (class dielectric.syntax.RDDOps$$anonfun$1, <function2>)
>>         - field (class:
>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>> name: apConcat$1, type: interface scala.Function2)
>>         - object (class
>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>> <function2>)
>>
>> Outside of submitting a PR to Scalaz to make things Serializable, what
>> can I
>> do to make things Serializable? I considered something like
>>
>> implicit def applicativeSerializable[F[_]](implicit F: Applicative[F]):
>> SomeSerializableType[F] =
>>   new SomeSerializableType { ... } ??
>>
>> Not sure how to go about doing it - I looked at java.io.Externalizable but
>> given `scalaz.Applicative` has no value members I'm not sure how to
>> implement the interface.
>>
>> Any guidance would be much appreciated - thanks!
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-around-Serializability-issues-for-types-not-in-my-control-tp22193.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>


-- 
Adelbert (Allen) Chang

Mime
View raw message