spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: enum-like types in Spark
Date Tue, 10 Mar 2015 01:39:32 GMT
Does this matter for our own internal types in Spark? I don't think
any of these types are designed to be used in RDD records, for
instance.

On Mon, Mar 9, 2015 at 6:25 PM, Aaron Davidson <ilikerps@gmail.com> wrote:
> Perhaps the problem with Java enums that was brought up was actually that
> their hashCode is not stable across JVMs, as it depends on the memory
> location of the enum itself.
>
> On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <irashid@cloudera.com> wrote:
>
>> Can you expand on the serde issues w/ java enum's at all?  I haven't heard
>> of any problems specific to enums.  The java object serialization rules
>> seem very clear and it doesn't seem like different jvms should have a
>> choice on what they do:
>>
>>
>> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469
>>
>> (in a nutshell, serialization must use enum.name())
>>
>> of course there are plenty of ways the user could screw this up(eg. rename
>> the enums, or change their meaning, or remove them).  But then again, all
>> of java serialization has issues w/ serialization the user has to be aware
>> of.  Eg., if we go with case objects, than java serialization blows up if
>> you add another helper method, even if that helper method is completely
>> compatible.
>>
>> Some prior debate in the scala community:
>>
>> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ
>>
>> SO post on which version to use in scala:
>>
>>
>> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types
>>
>> SO post about the macro-craziness people try to add to scala to make them
>> almost as good as a simple java enum:
>> (NB: the accepted answer doesn't actually work in all cases ...)
>>
>>
>> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched
>>
>> Another proposal to add better enums built into scala ... but seems to be
>> dormant:
>>
>> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk
>>
>>
>>
>> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mridul@gmail.com>
>> wrote:
>>
>> >   I have a strong dislike for java enum's due to the fact that they
>> > are not stable across JVM's - if it undergoes serde, you end up with
>> > unpredictable results at times [1].
>> > One of the reasons why we prevent enum's from being key : though it is
>> > highly possible users might depend on it internally and shoot
>> > themselves in the foot.
>> >
>> > Would be better to keep away from them in general and use something more
>> > stable.
>> >
>> > Regards,
>> > Mridul
>> >
>> > [1] Having had to debug this issue for 2 weeks - I really really hate it.
>> >
>> >
>> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <irashid@cloudera.com>
>> wrote:
>> > > I have a very strong dislike for #1 (scala enumerations).   I'm ok with
>> > #4
>> > > (with Xiangrui's final suggestion, especially making it sealed &
>> > available
>> > > in Java), but I really think #2, java enums, are the best option.
>> > >
>> > > Java enums actually have some very real advantages over the other
>> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
>> > has
>> > > been endless debate in the Scala community about the problems with the
>> > > approaches in Scala.  Very smart, level-headed Scala gurus have
>> > complained
>> > > about their short-comings (Rex Kerr's name is coming to mind, though
>> I'm
>> > > not positive about that); there have been numerous well-thought out
>> > > proposals to give Scala a better enum.  But the powers-that-be in Scala
>> > > always reject them.  IIRC the explanation for rejecting is basically
>> that
>> > > (a) enums aren't important enough for introducing some new special
>> > feature,
>> > > scala's got bigger things to work on and (b) if you really need a good
>> > > enum, just use java's enum.
>> > >
>> > > I doubt it really matters that much for Spark internals, which is why I
>> > > think #4 is fine.  But I figured I'd give my spiel, because every
>> > developer
>> > > loves language wars :)
>> > >
>> > > Imran
>> > >
>> > >
>> > >
>> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <mengxr@gmail.com>
>> wrote:
>> > >
>> > >> `case object` inside an `object` doesn't show up in Java. This is the
>> > >> minimal code I found to make everything show up correctly in both
>> > >> Scala and Java:
>> > >>
>> > >> sealed abstract class StorageLevel // cannot be a trait
>> > >>
>> > >> object StorageLevel {
>> > >>   private[this] case object _MemoryOnly extends StorageLevel
>> > >>   final val MemoryOnly: StorageLevel = _MemoryOnly
>> > >>
>> > >>   private[this] case object _DiskOnly extends StorageLevel
>> > >>   final val DiskOnly: StorageLevel = _DiskOnly
>> > >> }
>> > >>
>> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pwendell@gmail.com>
>> > >> wrote:
>> > >> > I like #4 as well and agree with Aaron's suggestion.
>> > >> >
>> > >> > - Patrick
>> > >> >
>> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <ilikerps@gmail.com>
>> > >> wrote:
>> > >> >> I'm cool with #4 as well, but make sure we dictate that the
values
>> > >> should
>> > >> >> be defined within an object with the same name as the enumeration
>> > (like
>> > >> we
>> > >> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
>> > >> >>
>> > >> >> e.g. we SHOULD do:
>> > >> >>
>> > >> >> trait StorageLevel
>> > >> >> object StorageLevel {
>> > >> >>   case object MemoryOnly extends StorageLevel
>> > >> >>   case object DiskOnly extends StorageLevel
>> > >> >> }
>> > >> >>
>> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>> > >> michael@databricks.com>
>> > >> >> wrote:
>> > >> >>
>> > >> >>> #4 with a preference for CamelCaseEnums
>> > >> >>>
>> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
>> > joseph@databricks.com>
>> > >> >>> wrote:
>> > >> >>>
>> > >> >>> > another vote for #4
>> > >> >>> > People are already used to adding "()" in Java.
>> > >> >>> >
>> > >> >>> >
>> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <
>> javadba@gmail.com
>> > >
>> > >> >>> wrote:
>> > >> >>> >
>> > >> >>> > > #4 but with MemoryOnly (more scala-like)
>> > >> >>> > >
>> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
>> > >> >>> > >
>> > >> >>> > > Constants, Values, Variable and Methods
>> > >> >>> > >
>> > >> >>> > > Constant names should be in upper camel case.
That is, if the
>> > >> member is
>> > >> >>> > > final, immutable and it belongs to a package
object or an
>> > object,
>> > >> it
>> > >> >>> may
>> > >> >>> > be
>> > >> >>> > > considered a constant (similar to Java'sstatic
final members):
>> > >> >>> > >
>> > >> >>> > >
>> > >> >>> > >    1. object Container {
>> > >> >>> > >    2.     val MyConstant = ...
>> > >> >>> > >    3. }
>> > >> >>> > >
>> > >> >>> > >
>> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <mengxr@gmail.com>:
>> > >> >>> > >
>> > >> >>> > > > Hi all,
>> > >> >>> > > >
>> > >> >>> > > > There are many places where we use enum-like
types in Spark,
>> > but
>> > >> in
>> > >> >>> > > > different ways. Every approach has both
pros and cons. I
>> > wonder
>> > >> >>> > > > whether there should be an "official" approach
for enum-like
>> > >> types in
>> > >> >>> > > > Spark.
>> > >> >>> > > >
>> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode,
WorkerState,
>> > etc)
>> > >> >>> > > >
>> > >> >>> > > > * All types show up as Enumeration.Value
in Java.
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>> > >> >>> > > >
>> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>> > >> >>> > > >
>> > >> >>> > > > * Implementation must be in a Java file.
>> > >> >>> > > > * Values doesn't show up in the ScalaDoc:
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>> > >> >>> > > >
>> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
>> > >> >>> > > >
>> > >> >>> > > > * Implementation must be in a Java file.
>> > >> >>> > > > * Doesn't need "()" in Java code.
>> > >> >>> > > > * Values don't show up in the ScalaDoc:
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>> > >> >>> > > >
>> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>> > >> >>> > > >
>> > >> >>> > > > * Needs "()" in Java code.
>> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>> > >> >>> > > >
>> > >> >>> > > > It would be great if we have an "official"
approach for this
>> > as
>> > >> well
>> > >> >>> > > > as the naming convention for enum-like
values ("MEMORY_ONLY"
>> > or
>> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with
"MEMORY_ONLY". Any
>> > >> >>> thoughts?
>> > >> >>> > > >
>> > >> >>> > > > Best,
>> > >> >>> > > > Xiangrui
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> ---------------------------------------------------------------------
>> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > >> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > >> For additional commands, e-mail: dev-help@spark.apache.org
>> > >>
>> > >>
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message