spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Scala 2.13 actual class used for Seq
Date Mon, 19 Oct 2020 12:24:23 GMT
Scala 2.13 changed the typedef of Seq to an immutable.Seq, yes. So lots of
things will now return an immutable Seq. Almost all code doesn't care what
Seq it returns and we didn't change any of that in the code, so, this is
just what we're getting as a 'default' from whatever operations produce the
Seq. (But a user app expecting a Seq in 2.13 will still just work, as it
will be expecting an immutable.Seq then)

You're right that many things don't necessarily return a WrappedArray
anymore (I think that doesn't exist anymore in 2.13? ArraySeq now?) so user
apps may need to change for 2.13, but, there are N things that any 2.13 app
would have to change.

On Mon, Oct 19, 2020 at 12:29 AM Koert Kuipers <koert@tresata.com> wrote:

> i have gotten used to spark always returning a WrappedArray for Seq. at
> some point i think i even read this was guaranteed to be the case. not sure
> if it still is...
>
> in spark 3.0.1 with scala 2.12 i get a WrappedArray as expected:
>
> scala> val x = Seq((1,2),(1,3)).toDF
> x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>
> scala>
> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class_of_3",
> udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
> +---+------+-------------------------------------------------+
> |_1 |_3    |class_of_3                                       |
> +---+------+-------------------------------------------------+
> |1  |[2, 3]|class scala.collection.mutable.WrappedArray$ofRef|
> +---+------+-------------------------------------------------+
>
> but when i build current master with scala 2.13 i get:
>
> scala> val x = Seq((1,2),(1,3)).toDF
> warning: 1 deprecation (since 2.13.3); for details, enable `:setting
> -deprecation' or `:replay -deprecation'
> val x: org.apache.spark.sql.DataFrame = [_1: int, _2: int]
>
> scala>
> x.groupBy("_1").agg(collect_list(col("_2")).as("_3")).withColumn("class",
> udf{ (s: Seq[Int]) => s.getClass.toString }.apply(col("_3"))).show(false)
> +---+------+---------------------------------------------+
> |_1 |_3    |class                                        |
> +---+------+---------------------------------------------+
> |1  |[2, 3]|class scala.collection.immutable.$colon$colon|
> +---+------+---------------------------------------------+
>
> i am curious if we are planning on returning immutable Seq going forward
> (which is nice)? and if so is List the best choice? i was sort of guessing
> it would be an immutable ArraySeq perhaps (given it provides efficient ways
> to wrap an array and access the underlying array)?
>
> best
>

Mime
View raw message