spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: enum-like types in Spark
Date Mon, 16 Mar 2015 20:54:39 GMT
Hey Xiangrui,

Do you want to write up a straw man proposal based on this line of discussion?

- Patrick

On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey <kevin.markey@oracle.com> wrote:
> In some applications, I have rather heavy use of Java enums which are needed
> for related Java APIs that the application uses.  And unfortunately, they
> are also used as keys.  As such, using the native hashcodes makes any
> function over keys unstable and unpredictable, so we now use Enum.name() as
> the key instead.  Oh well.  But it works and seems to work well.
>
> Kevin
>
>
> On 03/05/2015 09:49 PM, Mridul Muralidharan wrote:
>>
>>    I have a strong dislike for java enum's due to the fact that they
>> are not stable across JVM's - if it undergoes serde, you end up with
>> unpredictable results at times [1].
>> One of the reasons why we prevent enum's from being key : though it is
>> highly possible users might depend on it internally and shoot
>> themselves in the foot.
>>
>> Would be better to keep away from them in general and use something more
>> stable.
>>
>> Regards,
>> Mridul
>>
>> [1] Having had to debug this issue for 2 weeks - I really really hate it.
>>
>>
>> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <irashid@cloudera.com> wrote:
>>>
>>> I have a very strong dislike for #1 (scala enumerations).   I'm ok with
>>> #4
>>> (with Xiangrui's final suggestion, especially making it sealed &
>>> available
>>> in Java), but I really think #2, java enums, are the best option.
>>>
>>> Java enums actually have some very real advantages over the other
>>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
>>> has
>>> been endless debate in the Scala community about the problems with the
>>> approaches in Scala.  Very smart, level-headed Scala gurus have
>>> complained
>>> about their short-comings (Rex Kerr's name is coming to mind, though I'm
>>> not positive about that); there have been numerous well-thought out
>>> proposals to give Scala a better enum.  But the powers-that-be in Scala
>>> always reject them.  IIRC the explanation for rejecting is basically that
>>> (a) enums aren't important enough for introducing some new special
>>> feature,
>>> scala's got bigger things to work on and (b) if you really need a good
>>> enum, just use java's enum.
>>>
>>> I doubt it really matters that much for Spark internals, which is why I
>>> think #4 is fine.  But I figured I'd give my spiel, because every
>>> developer
>>> loves language wars :)
>>>
>>> Imran
>>>
>>>
>>>
>>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
>>>
>>>> `case object` inside an `object` doesn't show up in Java. This is the
>>>> minimal code I found to make everything show up correctly in both
>>>> Scala and Java:
>>>>
>>>> sealed abstract class StorageLevel // cannot be a trait
>>>>
>>>> object StorageLevel {
>>>>    private[this] case object _MemoryOnly extends StorageLevel
>>>>    final val MemoryOnly: StorageLevel = _MemoryOnly
>>>>
>>>>    private[this] case object _DiskOnly extends StorageLevel
>>>>    final val DiskOnly: StorageLevel = _DiskOnly
>>>> }
>>>>
>>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pwendell@gmail.com>
>>>> wrote:
>>>>>
>>>>> I like #4 as well and agree with Aaron's suggestion.
>>>>>
>>>>> - Patrick
>>>>>
>>>>> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <ilikerps@gmail.com>
>>>>
>>>> wrote:
>>>>>>
>>>>>> I'm cool with #4 as well, but make sure we dictate that the values
>>>>
>>>> should
>>>>>>
>>>>>> be defined within an object with the same name as the enumeration
>>>>>> (like
>>>>
>>>> we
>>>>>>
>>>>>> do for StorageLevel). Otherwise we may pollute a higher namespace.
>>>>>>
>>>>>> e.g. we SHOULD do:
>>>>>>
>>>>>> trait StorageLevel
>>>>>> object StorageLevel {
>>>>>>    case object MemoryOnly extends StorageLevel
>>>>>>    case object DiskOnly extends StorageLevel
>>>>>> }
>>>>>>
>>>>>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>>>>
>>>> michael@databricks.com>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> #4 with a preference for CamelCaseEnums
>>>>>>>
>>>>>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley
>>>>>>> <joseph@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> another vote for #4
>>>>>>>> People are already used to adding "()" in Java.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <javadba@gmail.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> #4 but with MemoryOnly (more scala-like)
>>>>>>>>>
>>>>>>>>> http://docs.scala-lang.org/style/naming-conventions.html
>>>>>>>>>
>>>>>>>>> Constants, Values, Variable and Methods
>>>>>>>>>
>>>>>>>>> Constant names should be in upper camel case. That is,
if the
>>>>
>>>> member is
>>>>>>>>>
>>>>>>>>> final, immutable and it belongs to a package object or
an object,
>>>>
>>>> it
>>>>>>>
>>>>>>> may
>>>>>>>>
>>>>>>>> be
>>>>>>>>>
>>>>>>>>> considered a constant (similar to Java'sstatic final
members):
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     1. object Container {
>>>>>>>>>     2.     val MyConstant = ...
>>>>>>>>>     3. }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <mengxr@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> There are many places where we use enum-like types
in Spark, but
>>>>
>>>> in
>>>>>>>>>>
>>>>>>>>>> different ways. Every approach has both pros and
cons. I wonder
>>>>>>>>>> whether there should be an "official" approach for
enum-like
>>>>
>>>> types in
>>>>>>>>>>
>>>>>>>>>> Spark.
>>>>>>>>>>
>>>>>>>>>> 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState,
etc)
>>>>>>>>>>
>>>>>>>>>> * All types show up as Enumeration.Value in Java.
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>>>>>>>>>>
>>>>>>>>>> 2. Java's Enum (e.g., SaveMode, IOMode)
>>>>>>>>>>
>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>> * Values doesn't show up in the ScalaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>>>>>>>>>>
>>>>>>>>>> 3. Static fields in Java (e.g., TripletFields)
>>>>>>>>>>
>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>> * Doesn't need "()" in Java code.
>>>>>>>>>> * Values don't show up in the ScalaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>>>>>>>>>>
>>>>>>>>>> 4. Objects in Scala. (e.g., StorageLevel)
>>>>>>>>>>
>>>>>>>>>> * Needs "()" in Java code.
>>>>>>>>>> * Values show up in both ScalaDoc and JavaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>>>>>>>>>>
>>>>>>>>>> It would be great if we have an "official" approach
for this as
>>>>
>>>> well
>>>>>>>>>>
>>>>>>>>>> as the naming convention for enum-like values ("MEMORY_ONLY"
or
>>>>>>>>>> "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY".
Any
>>>>>>>
>>>>>>> thoughts?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Xiangrui
>>>>>>>>>>
>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message