spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Using custom class as a key for groupByKey() or reduceByKey()
Date Sun, 15 Jun 2014 16:11:54 GMT
In Java at large, you must always implement hashCode() when you implement
equals(). This is not specific to Spark. This is to maintain the contract
that two equals() instances have the same hash code, and that's not the
case for your class now. This causes weird things to happen wherever the
hash code contract is depended upon.

This probably works fine:

@Override
public int hashCode() {
  return dcxId.hashCode() ^ trxId.hashCode() ^ msgType.hashCode();
}


On Sun, Jun 15, 2014 at 11:45 AM, Gaurav Jain <jaing@student.ethz.ch> wrote:

> I have a simple Java class as follows, that I want to use as a key while
> applying groupByKey or reduceByKey functions:
>
> private static class FlowId {
>                 public String dcxId;
>                 public String trxId;
>                 public String msgType;
>
>                 public FlowId(String dcxId, String trxId, String msgType) {
>                         this.dcxId = dcxId;
>                         this.trxId = trxId;
>                         this.msgType = msgType;
>                 }
>
>                 public boolean equals(Object other) {
>                         if (other == this) return true;
>                         if (other == null) return false;
>                         if (getClass() != other.getClass()) return false;
>                         FlowId fid = (FlowId) other;
>                         if (this.dcxId.equals(fid.dcxId) &&
> this.trxId.equals(fid.trxId) &&
>                                         this.msgType.equals(fid.msgType)) {
>                                 return true;
>                         }
>                         return false;
>                 }
> }
>
> I figured that an equals() method would need to be overridden to ensure
> comparison of keys, but still entries with the same key are listed
> separately after applying a groupByKey(), for example. What further
> modifications are necessary to enable usage of above class as a key. Right
> now, I have fallen back to using Tuple3<String, String, String> instead of
> the FlowId class, but it makes the code unnecessarily verbose.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-custom-class-as-a-key-for-groupByKey-or-reduceByKey-tp7640.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message