spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Rashid <iras...@cloudera.com>
Subject Re: enum-like types in Spark
Date Mon, 23 Mar 2015 20:33:27 GMT
I've just switched some of my code over to the new format, and I just want
to make sure everyone realizes what we are getting into.  I went from 10
lines as java enums

https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20

to 30 lines with the new format:

https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250

its not just that its verbose.  each name has to be repeated 4 times, with
potential typos in some locations that won't be caught by the compiler.
Also, you have to manually maintain the "values" as you update the set of
enums, the compiler won't do it for you.

The only downside I've heard for java enums is enum.hashcode().  OTOH, the
downsides for this version are: maintainability / verbosity, no values(),
more cumbersome to use from java, no enum map / enumset.

I did put together a little util to at least get back the equivalent of
enum.valueOf() with this format

https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala

I'm not trying to prevent us from moving forward on this, its fine if this
is still what everyone wants, but I feel pretty strongly java enums make
more sense.

thanks,
Imran


On Tue, Mar 17, 2015 at 2:07 PM, Xiangrui Meng <mengxr@gmail.com> wrote:

> Let me put a quick summary. #4 got majority vote with CamelCase but
> not UPPERCASE. The following is a minimal implementation that works
> for both Scala and Java. In Python, we use string for enums. This
> proposal is only for new public APIs. We are not going to change
> existing ones. -Xiangrui
>
> ~~~
> sealed abstract class StorageLevel
>
> object StorageLevel {
>
>   def fromString(name: String): StorageLevel = ???
>
>   val MemoryOnly: StorageLevel = {
>     case object MemoryOnly extends StorageLevel
>     MemoryOnly
>   }
>
>   val DiskOnly: StorageLevel = {
>     case object DiskOnly extends StorageLevel
>     DiskOnly
>  }
> }
> ~~~
>
> On Mon, Mar 16, 2015 at 3:04 PM, Aaron Davidson <ilikerps@gmail.com>
> wrote:
> > It's unrelated to the proposal, but Enum#ordinal() should be much faster,
> > assuming it's not serialized to JVMs with different versions of the enum
> :)
> >
> > On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey <kevin.markey@oracle.com>
> > wrote:
> >
> >> In some applications, I have rather heavy use of Java enums which are
> >> needed for related Java APIs that the application uses.  And
> unfortunately,
> >> they are also used as keys.  As such, using the native hashcodes makes
> any
> >> function over keys unstable and unpredictable, so we now use
> Enum.name() as
> >> the key instead.  Oh well.  But it works and seems to work well.
> >>
> >> Kevin
> >>
> >>
> >> On 03/05/2015 09:49 PM, Mridul Muralidharan wrote:
> >>
> >>>    I have a strong dislike for java enum's due to the fact that they
> >>> are not stable across JVM's - if it undergoes serde, you end up with
> >>> unpredictable results at times [1].
> >>> One of the reasons why we prevent enum's from being key : though it is
> >>> highly possible users might depend on it internally and shoot
> >>> themselves in the foot.
> >>>
> >>> Would be better to keep away from them in general and use something
> more
> >>> stable.
> >>>
> >>> Regards,
> >>> Mridul
> >>>
> >>> [1] Having had to debug this issue for 2 weeks - I really really hate
> it.
> >>>
> >>>
> >>> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <irashid@cloudera.com>
> >>> wrote:
> >>>
> >>>> I have a very strong dislike for #1 (scala enumerations).   I'm ok
> with
> >>>> #4
> >>>> (with Xiangrui's final suggestion, especially making it sealed &
> >>>> available
> >>>> in Java), but I really think #2, java enums, are the best option.
> >>>>
> >>>> Java enums actually have some very real advantages over the other
> >>>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.
> There
> >>>> has
> >>>> been endless debate in the Scala community about the problems with the
> >>>> approaches in Scala.  Very smart, level-headed Scala gurus have
> >>>> complained
> >>>> about their short-comings (Rex Kerr's name is coming to mind, though
> I'm
> >>>> not positive about that); there have been numerous well-thought out
> >>>> proposals to give Scala a better enum.  But the powers-that-be in
> Scala
> >>>> always reject them.  IIRC the explanation for rejecting is basically
> that
> >>>> (a) enums aren't important enough for introducing some new special
> >>>> feature,
> >>>> scala's got bigger things to work on and (b) if you really need a good
> >>>> enum, just use java's enum.
> >>>>
> >>>> I doubt it really matters that much for Spark internals, which is why
> I
> >>>> think #4 is fine.  But I figured I'd give my spiel, because every
> >>>> developer
> >>>> loves language wars :)
> >>>>
> >>>> Imran
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <mengxr@gmail.com>
> wrote:
> >>>>
> >>>>  `case object` inside an `object` doesn't show up in Java. This is the
> >>>>> minimal code I found to make everything show up correctly in both
> >>>>> Scala and Java:
> >>>>>
> >>>>> sealed abstract class StorageLevel // cannot be a trait
> >>>>>
> >>>>> object StorageLevel {
> >>>>>    private[this] case object _MemoryOnly extends StorageLevel
> >>>>>    final val MemoryOnly: StorageLevel = _MemoryOnly
> >>>>>
> >>>>>    private[this] case object _DiskOnly extends StorageLevel
> >>>>>    final val DiskOnly: StorageLevel = _DiskOnly
> >>>>> }
> >>>>>
> >>>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pwendell@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> I like #4 as well and agree with Aaron's suggestion.
> >>>>>>
> >>>>>> - Patrick
> >>>>>>
> >>>>>> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <ilikerps@gmail.com>
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> I'm cool with #4 as well, but make sure we dictate that the
values
> >>>>>>>
> >>>>>> should
> >>>>>
> >>>>>> be defined within an object with the same name as the enumeration
> (like
> >>>>>>>
> >>>>>> we
> >>>>>
> >>>>>> do for StorageLevel). Otherwise we may pollute a higher namespace.
> >>>>>>>
> >>>>>>> e.g. we SHOULD do:
> >>>>>>>
> >>>>>>> trait StorageLevel
> >>>>>>> object StorageLevel {
> >>>>>>>    case object MemoryOnly extends StorageLevel
> >>>>>>>    case object DiskOnly extends StorageLevel
> >>>>>>> }
> >>>>>>>
> >>>>>>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> >>>>>>>
> >>>>>> michael@databricks.com>
> >>>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>  #4 with a preference for CamelCaseEnums
> >>>>>>>>
> >>>>>>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> >>>>>>>> joseph@databricks.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>  another vote for #4
> >>>>>>>>> People are already used to adding "()" in Java.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <
> javadba@gmail.com>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> #4 but with MemoryOnly (more scala-like)
> >>>>>>>>>>
> >>>>>>>>>> http://docs.scala-lang.org/style/naming-conventions.html
> >>>>>>>>>>
> >>>>>>>>>> Constants, Values, Variable and Methods
> >>>>>>>>>>
> >>>>>>>>>> Constant names should be in upper camel case.
That is, if the
> >>>>>>>>>>
> >>>>>>>>> member is
> >>>>>
> >>>>>> final, immutable and it belongs to a package object or an object,
> >>>>>>>>>>
> >>>>>>>>> it
> >>>>>
> >>>>>> may
> >>>>>>>>
> >>>>>>>>> be
> >>>>>>>>>
> >>>>>>>>>> considered a constant (similar to Java'sstatic
final members):
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>     1. object Container {
> >>>>>>>>>>     2.     val MyConstant = ...
> >>>>>>>>>>     3. }
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <mengxr@gmail.com>:
> >>>>>>>>>>
> >>>>>>>>>>  Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> There are many places where we use enum-like
types in Spark,
> but
> >>>>>>>>>>>
> >>>>>>>>>> in
> >>>>>
> >>>>>> different ways. Every approach has both pros and cons. I wonder
> >>>>>>>>>>> whether there should be an "official" approach
for enum-like
> >>>>>>>>>>>
> >>>>>>>>>> types in
> >>>>>
> >>>>>> Spark.
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Scala's Enumeration (e.g., SchedulingMode,
WorkerState, etc)
> >>>>>>>>>>>
> >>>>>>>>>>> * All types show up as Enumeration.Value
in Java.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
> >>>>> spark/scheduler/SchedulingMode.html
> >>>>>
> >>>>>> 2. Java's Enum (e.g., SaveMode, IOMode)
> >>>>>>>>>>>
> >>>>>>>>>>> * Implementation must be in a Java file.
> >>>>>>>>>>> * Values doesn't show up in the ScalaDoc:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
> >>>>> spark.network.util.IOMode
> >>>>>
> >>>>>> 3. Static fields in Java (e.g., TripletFields)
> >>>>>>>>>>>
> >>>>>>>>>>> * Implementation must be in a Java file.
> >>>>>>>>>>> * Doesn't need "()" in Java code.
> >>>>>>>>>>> * Values don't show up in the ScalaDoc:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
> >>>>> spark.graphx.TripletFields
> >>>>>
> >>>>>> 4. Objects in Scala. (e.g., StorageLevel)
> >>>>>>>>>>>
> >>>>>>>>>>> * Needs "()" in Java code.
> >>>>>>>>>>> * Values show up in both ScalaDoc and JavaDoc:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
> >>>>> spark.storage.StorageLevel$
> >>>>>
> >>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
> >>>>> spark/storage/StorageLevel.html
> >>>>>
> >>>>>> It would be great if we have an "official" approach for this
as
> >>>>>>>>>>>
> >>>>>>>>>> well
> >>>>>
> >>>>>> as the naming convention for enum-like values ("MEMORY_ONLY"
or
> >>>>>>>>>>> "MemoryOnly"). Personally, I like 4) with
"MEMORY_ONLY". Any
> >>>>>>>>>>>
> >>>>>>>>>> thoughts?
> >>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>>> Xiangrui
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  ------------------------------------------------------------
> >>>>> ---------
> >>>>>
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  ------------------------------------------------------------
> >>>>> ---------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>>>> For additional commands, e-mail: dev-help@spark.apache.org
> >>>>>
> >>>>>
> >>>>>
> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>> For additional commands, e-mail: dev-help@spark.apache.org
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message