spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Using custom class as a key for groupByKey() or reduceByKey()
Date Sun, 15 Jun 2014 17:03:58 GMT
Good point Sean.  I've filed a ticket to document the equals() / hashCode()
requirements for custom keys in the Spark documentation, as this has come
up a few times on the user@ list.

https://issues.apache.org/jira/browse/SPARK-2148


On Sun, Jun 15, 2014 at 12:11 PM, Sean Owen <sowen@cloudera.com> wrote:

> In Java at large, you must always implement hashCode() when you implement
> equals(). This is not specific to Spark. This is to maintain the contract
> that two equals() instances have the same hash code, and that's not the
> case for your class now. This causes weird things to happen wherever the
> hash code contract is depended upon.
>
> This probably works fine:
>
> @Override
> public int hashCode() {
>   return dcxId.hashCode() ^ trxId.hashCode() ^ msgType.hashCode();
> }
>
>
> On Sun, Jun 15, 2014 at 11:45 AM, Gaurav Jain <jaing@student.ethz.ch>
> wrote:
>
>> I have a simple Java class as follows, that I want to use as a key while
>> applying groupByKey or reduceByKey functions:
>>
>> private static class FlowId {
>>                 public String dcxId;
>>                 public String trxId;
>>                 public String msgType;
>>
>>                 public FlowId(String dcxId, String trxId, String msgType)
>> {
>>                         this.dcxId = dcxId;
>>                         this.trxId = trxId;
>>                         this.msgType = msgType;
>>                 }
>>
>>                 public boolean equals(Object other) {
>>                         if (other == this) return true;
>>                         if (other == null) return false;
>>                         if (getClass() != other.getClass()) return false;
>>                         FlowId fid = (FlowId) other;
>>                         if (this.dcxId.equals(fid.dcxId) &&
>> this.trxId.equals(fid.trxId) &&
>>                                         this.msgType.equals(fid.msgType))
>> {
>>                                 return true;
>>                         }
>>                         return false;
>>                 }
>> }
>>
>> I figured that an equals() method would need to be overridden to ensure
>> comparison of keys, but still entries with the same key are listed
>> separately after applying a groupByKey(), for example. What further
>> modifications are necessary to enable usage of above class as a key. Right
>> now, I have fallen back to using Tuple3<String, String, String> instead of
>> the FlowId class, but it makes the code unnecessarily verbose.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Using-custom-class-as-a-key-for-groupByKey-or-reduceByKey-tp7640.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Mime
View raw message