spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: enum-like types in Spark
Date Tue, 10 Mar 2015 01:25:45 GMT
Perhaps the problem with Java enums that was brought up was actually that
their hashCode is not stable across JVMs, as it depends on the memory
location of the enum itself.

On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <irashid@cloudera.com> wrote:

> Can you expand on the serde issues w/ java enum's at all?  I haven't heard
> of any problems specific to enums.  The java object serialization rules
> seem very clear and it doesn't seem like different jvms should have a
> choice on what they do:
>
>
> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469
>
> (in a nutshell, serialization must use enum.name())
>
> of course there are plenty of ways the user could screw this up(eg. rename
> the enums, or change their meaning, or remove them).  But then again, all
> of java serialization has issues w/ serialization the user has to be aware
> of.  Eg., if we go with case objects, than java serialization blows up if
> you add another helper method, even if that helper method is completely
> compatible.
>
> Some prior debate in the scala community:
>
> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ
>
> SO post on which version to use in scala:
>
>
> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types
>
> SO post about the macro-craziness people try to add to scala to make them
> almost as good as a simple java enum:
> (NB: the accepted answer doesn't actually work in all cases ...)
>
>
> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched
>
> Another proposal to add better enums built into scala ... but seems to be
> dormant:
>
> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk
>
>
>
> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mridul@gmail.com>
> wrote:
>
> >   I have a strong dislike for java enum's due to the fact that they
> > are not stable across JVM's - if it undergoes serde, you end up with
> > unpredictable results at times [1].
> > One of the reasons why we prevent enum's from being key : though it is
> > highly possible users might depend on it internally and shoot
> > themselves in the foot.
> >
> > Would be better to keep away from them in general and use something more
> > stable.
> >
> > Regards,
> > Mridul
> >
> > [1] Having had to debug this issue for 2 weeks - I really really hate it.
> >
> >
> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <irashid@cloudera.com>
> wrote:
> > > I have a very strong dislike for #1 (scala enumerations).   I'm ok with
> > #4
> > > (with Xiangrui's final suggestion, especially making it sealed &
> > available
> > > in Java), but I really think #2, java enums, are the best option.
> > >
> > > Java enums actually have some very real advantages over the other
> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
> > has
> > > been endless debate in the Scala community about the problems with the
> > > approaches in Scala.  Very smart, level-headed Scala gurus have
> > complained
> > > about their short-comings (Rex Kerr's name is coming to mind, though
> I'm
> > > not positive about that); there have been numerous well-thought out
> > > proposals to give Scala a better enum.  But the powers-that-be in Scala
> > > always reject them.  IIRC the explanation for rejecting is basically
> that
> > > (a) enums aren't important enough for introducing some new special
> > feature,
> > > scala's got bigger things to work on and (b) if you really need a good
> > > enum, just use java's enum.
> > >
> > > I doubt it really matters that much for Spark internals, which is why I
> > > think #4 is fine.  But I figured I'd give my spiel, because every
> > developer
> > > loves language wars :)
> > >
> > > Imran
> > >
> > >
> > >
> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <mengxr@gmail.com>
> wrote:
> > >
> > >> `case object` inside an `object` doesn't show up in Java. This is the
> > >> minimal code I found to make everything show up correctly in both
> > >> Scala and Java:
> > >>
> > >> sealed abstract class StorageLevel // cannot be a trait
> > >>
> > >> object StorageLevel {
> > >>   private[this] case object _MemoryOnly extends StorageLevel
> > >>   final val MemoryOnly: StorageLevel = _MemoryOnly
> > >>
> > >>   private[this] case object _DiskOnly extends StorageLevel
> > >>   final val DiskOnly: StorageLevel = _DiskOnly
> > >> }
> > >>
> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pwendell@gmail.com>
> > >> wrote:
> > >> > I like #4 as well and agree with Aaron's suggestion.
> > >> >
> > >> > - Patrick
> > >> >
> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <ilikerps@gmail.com>
> > >> wrote:
> > >> >> I'm cool with #4 as well, but make sure we dictate that the values
> > >> should
> > >> >> be defined within an object with the same name as the enumeration
> > (like
> > >> we
> > >> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
> > >> >>
> > >> >> e.g. we SHOULD do:
> > >> >>
> > >> >> trait StorageLevel
> > >> >> object StorageLevel {
> > >> >>   case object MemoryOnly extends StorageLevel
> > >> >>   case object DiskOnly extends StorageLevel
> > >> >> }
> > >> >>
> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> > >> michael@databricks.com>
> > >> >> wrote:
> > >> >>
> > >> >>> #4 with a preference for CamelCaseEnums
> > >> >>>
> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> > joseph@databricks.com>
> > >> >>> wrote:
> > >> >>>
> > >> >>> > another vote for #4
> > >> >>> > People are already used to adding "()" in Java.
> > >> >>> >
> > >> >>> >
> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <
> javadba@gmail.com
> > >
> > >> >>> wrote:
> > >> >>> >
> > >> >>> > > #4 but with MemoryOnly (more scala-like)
> > >> >>> > >
> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> > >> >>> > >
> > >> >>> > > Constants, Values, Variable and Methods
> > >> >>> > >
> > >> >>> > > Constant names should be in upper camel case. That
is, if the
> > >> member is
> > >> >>> > > final, immutable and it belongs to a package object
or an
> > object,
> > >> it
> > >> >>> may
> > >> >>> > be
> > >> >>> > > considered a constant (similar to Java'sstatic final
members):
> > >> >>> > >
> > >> >>> > >
> > >> >>> > >    1. object Container {
> > >> >>> > >    2.     val MyConstant = ...
> > >> >>> > >    3. }
> > >> >>> > >
> > >> >>> > >
> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <mengxr@gmail.com>:
> > >> >>> > >
> > >> >>> > > > Hi all,
> > >> >>> > > >
> > >> >>> > > > There are many places where we use enum-like
types in Spark,
> > but
> > >> in
> > >> >>> > > > different ways. Every approach has both pros
and cons. I
> > wonder
> > >> >>> > > > whether there should be an "official" approach
for enum-like
> > >> types in
> > >> >>> > > > Spark.
> > >> >>> > > >
> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode,
WorkerState,
> > etc)
> > >> >>> > > >
> > >> >>> > > > * All types show up as Enumeration.Value in
Java.
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> > >> >>> > > >
> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> > >> >>> > > >
> > >> >>> > > > * Implementation must be in a Java file.
> > >> >>> > > > * Values doesn't show up in the ScalaDoc:
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> > >> >>> > > >
> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> > >> >>> > > >
> > >> >>> > > > * Implementation must be in a Java file.
> > >> >>> > > > * Doesn't need "()" in Java code.
> > >> >>> > > > * Values don't show up in the ScalaDoc:
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> > >> >>> > > >
> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> > >> >>> > > >
> > >> >>> > > > * Needs "()" in Java code.
> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> > >> >>> > > >
> > >> >>> > > > It would be great if we have an "official"
approach for this
> > as
> > >> well
> > >> >>> > > > as the naming convention for enum-like values
("MEMORY_ONLY"
> > or
> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY".
Any
> > >> >>> thoughts?
> > >> >>> > > >
> > >> >>> > > > Best,
> > >> >>> > > > Xiangrui
> > >> >>> > > >
> > >> >>> > > >
> > >> ---------------------------------------------------------------------
> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > >> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > >> For additional commands, e-mail: dev-help@spark.apache.org
> > >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message