spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RJ Nowling <rnowl...@gmail.com>
Subject Re: enum-like types in Spark
Date Wed, 11 Mar 2015 19:56:40 GMT
How do these proposals affect PySpark?  I think compatibility with PySpark
through Py4J should be considered.

On Mon, Mar 9, 2015 at 8:39 PM, Patrick Wendell <pwendell@gmail.com> wrote:

> Does this matter for our own internal types in Spark? I don't think
> any of these types are designed to be used in RDD records, for
> instance.
>
> On Mon, Mar 9, 2015 at 6:25 PM, Aaron Davidson <ilikerps@gmail.com> wrote:
> > Perhaps the problem with Java enums that was brought up was actually that
> > their hashCode is not stable across JVMs, as it depends on the memory
> > location of the enum itself.
> >
> > On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <irashid@cloudera.com>
> wrote:
> >
> >> Can you expand on the serde issues w/ java enum's at all?  I haven't
> heard
> >> of any problems specific to enums.  The java object serialization rules
> >> seem very clear and it doesn't seem like different jvms should have a
> >> choice on what they do:
> >>
> >>
> >>
> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469
> >>
> >> (in a nutshell, serialization must use enum.name())
> >>
> >> of course there are plenty of ways the user could screw this up(eg.
> rename
> >> the enums, or change their meaning, or remove them).  But then again,
> all
> >> of java serialization has issues w/ serialization the user has to be
> aware
> >> of.  Eg., if we go with case objects, than java serialization blows up
> if
> >> you add another helper method, even if that helper method is completely
> >> compatible.
> >>
> >> Some prior debate in the scala community:
> >>
> >>
> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ
> >>
> >> SO post on which version to use in scala:
> >>
> >>
> >>
> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types
> >>
> >> SO post about the macro-craziness people try to add to scala to make
> them
> >> almost as good as a simple java enum:
> >> (NB: the accepted answer doesn't actually work in all cases ...)
> >>
> >>
> >>
> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched
> >>
> >> Another proposal to add better enums built into scala ... but seems to
> be
> >> dormant:
> >>
> >> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk
> >>
> >>
> >>
> >> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mridul@gmail.com>
> >> wrote:
> >>
> >> >   I have a strong dislike for java enum's due to the fact that they
> >> > are not stable across JVM's - if it undergoes serde, you end up with
> >> > unpredictable results at times [1].
> >> > One of the reasons why we prevent enum's from being key : though it is
> >> > highly possible users might depend on it internally and shoot
> >> > themselves in the foot.
> >> >
> >> > Would be better to keep away from them in general and use something
> more
> >> > stable.
> >> >
> >> > Regards,
> >> > Mridul
> >> >
> >> > [1] Having had to debug this issue for 2 weeks - I really really hate
> it.
> >> >
> >> >
> >> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <irashid@cloudera.com>
> >> wrote:
> >> > > I have a very strong dislike for #1 (scala enumerations).   I'm ok
> with
> >> > #4
> >> > > (with Xiangrui's final suggestion, especially making it sealed &
> >> > available
> >> > > in Java), but I really think #2, java enums, are the best option.
> >> > >
> >> > > Java enums actually have some very real advantages over the other
> >> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.
> There
> >> > has
> >> > > been endless debate in the Scala community about the problems with
> the
> >> > > approaches in Scala.  Very smart, level-headed Scala gurus have
> >> > complained
> >> > > about their short-comings (Rex Kerr's name is coming to mind, though
> >> I'm
> >> > > not positive about that); there have been numerous well-thought out
> >> > > proposals to give Scala a better enum.  But the powers-that-be in
> Scala
> >> > > always reject them.  IIRC the explanation for rejecting is basically
> >> that
> >> > > (a) enums aren't important enough for introducing some new special
> >> > feature,
> >> > > scala's got bigger things to work on and (b) if you really need a
> good
> >> > > enum, just use java's enum.
> >> > >
> >> > > I doubt it really matters that much for Spark internals, which is
> why I
> >> > > think #4 is fine.  But I figured I'd give my spiel, because every
> >> > developer
> >> > > loves language wars :)
> >> > >
> >> > > Imran
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <mengxr@gmail.com>
> >> wrote:
> >> > >
> >> > >> `case object` inside an `object` doesn't show up in Java. This
is
> the
> >> > >> minimal code I found to make everything show up correctly in both
> >> > >> Scala and Java:
> >> > >>
> >> > >> sealed abstract class StorageLevel // cannot be a trait
> >> > >>
> >> > >> object StorageLevel {
> >> > >>   private[this] case object _MemoryOnly extends StorageLevel
> >> > >>   final val MemoryOnly: StorageLevel = _MemoryOnly
> >> > >>
> >> > >>   private[this] case object _DiskOnly extends StorageLevel
> >> > >>   final val DiskOnly: StorageLevel = _DiskOnly
> >> > >> }
> >> > >>
> >> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <
> pwendell@gmail.com>
> >> > >> wrote:
> >> > >> > I like #4 as well and agree with Aaron's suggestion.
> >> > >> >
> >> > >> > - Patrick
> >> > >> >
> >> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <
> ilikerps@gmail.com>
> >> > >> wrote:
> >> > >> >> I'm cool with #4 as well, but make sure we dictate that
the
> values
> >> > >> should
> >> > >> >> be defined within an object with the same name as the
> enumeration
> >> > (like
> >> > >> we
> >> > >> >> do for StorageLevel). Otherwise we may pollute a higher
> namespace.
> >> > >> >>
> >> > >> >> e.g. we SHOULD do:
> >> > >> >>
> >> > >> >> trait StorageLevel
> >> > >> >> object StorageLevel {
> >> > >> >>   case object MemoryOnly extends StorageLevel
> >> > >> >>   case object DiskOnly extends StorageLevel
> >> > >> >> }
> >> > >> >>
> >> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> >> > >> michael@databricks.com>
> >> > >> >> wrote:
> >> > >> >>
> >> > >> >>> #4 with a preference for CamelCaseEnums
> >> > >> >>>
> >> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> >> > joseph@databricks.com>
> >> > >> >>> wrote:
> >> > >> >>>
> >> > >> >>> > another vote for #4
> >> > >> >>> > People are already used to adding "()" in Java.
> >> > >> >>> >
> >> > >> >>> >
> >> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch
<
> >> javadba@gmail.com
> >> > >
> >> > >> >>> wrote:
> >> > >> >>> >
> >> > >> >>> > > #4 but with MemoryOnly (more scala-like)
> >> > >> >>> > >
> >> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> >> > >> >>> > >
> >> > >> >>> > > Constants, Values, Variable and Methods
> >> > >> >>> > >
> >> > >> >>> > > Constant names should be in upper camel
case. That is, if
> the
> >> > >> member is
> >> > >> >>> > > final, immutable and it belongs to a package
object or an
> >> > object,
> >> > >> it
> >> > >> >>> may
> >> > >> >>> > be
> >> > >> >>> > > considered a constant (similar to Java'sstatic
final
> members):
> >> > >> >>> > >
> >> > >> >>> > >
> >> > >> >>> > >    1. object Container {
> >> > >> >>> > >    2.     val MyConstant = ...
> >> > >> >>> > >    3. }
> >> > >> >>> > >
> >> > >> >>> > >
> >> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng
<mengxr@gmail.com
> >:
> >> > >> >>> > >
> >> > >> >>> > > > Hi all,
> >> > >> >>> > > >
> >> > >> >>> > > > There are many places where we use
enum-like types in
> Spark,
> >> > but
> >> > >> in
> >> > >> >>> > > > different ways. Every approach has
both pros and cons. I
> >> > wonder
> >> > >> >>> > > > whether there should be an "official"
approach for
> enum-like
> >> > >> types in
> >> > >> >>> > > > Spark.
> >> > >> >>> > > >
> >> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode,
> WorkerState,
> >> > etc)
> >> > >> >>> > > >
> >> > >> >>> > > > * All types show up as Enumeration.Value
in Java.
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> >> > >> >>> > > >
> >> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> >> > >> >>> > > >
> >> > >> >>> > > > * Implementation must be in a Java
file.
> >> > >> >>> > > > * Values doesn't show up in the ScalaDoc:
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> >> > >> >>> > > >
> >> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> >> > >> >>> > > >
> >> > >> >>> > > > * Implementation must be in a Java
file.
> >> > >> >>> > > > * Doesn't need "()" in Java code.
> >> > >> >>> > > > * Values don't show up in the ScalaDoc:
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> >> > >> >>> > > >
> >> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> >> > >> >>> > > >
> >> > >> >>> > > > * Needs "()" in Java code.
> >> > >> >>> > > > * Values show up in both ScalaDoc
and JavaDoc:
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> >> > >> >>> > > >
> >> > >> >>> > > > It would be great if we have an "official"
approach for
> this
> >> > as
> >> > >> well
> >> > >> >>> > > > as the naming convention for enum-like
values
> ("MEMORY_ONLY"
> >> > or
> >> > >> >>> > > > "MemoryOnly"). Personally, I like
4) with "MEMORY_ONLY".
> Any
> >> > >> >>> thoughts?
> >> > >> >>> > > >
> >> > >> >>> > > > Best,
> >> > >> >>> > > > Xiangrui
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >>
> ---------------------------------------------------------------------
> >> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> > >> >>> > > > For additional commands, e-mail:
> dev-help@spark.apache.org
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> > >>
> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> > >> For additional commands, e-mail: dev-help@spark.apache.org
> >> > >>
> >> > >>
> >> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message