In Java at large, you must always implement hashCode() when you implement equals(). This is not specific to Spark. This is to maintain the contract that two equals() instances have the same hash code, and that's not the case for your class now. This causes weird things to happen wherever the hash code contract is depended upon.

This probably works fine:

public int hashCode() {
  return dcxId.hashCode() ^ trxId.hashCode() ^ msgType.hashCode();

On Sun, Jun 15, 2014 at 11:45 AM, Gaurav Jain <> wrote:
I have a simple Java class as follows, that I want to use as a key while
applying groupByKey or reduceByKey functions:

private static class FlowId {
                public String dcxId;
                public String trxId;
                public String msgType;

                public FlowId(String dcxId, String trxId, String msgType) {
                        this.dcxId = dcxId;
                        this.trxId = trxId;
                        this.msgType = msgType;

                public boolean equals(Object other) {
                        if (other == this) return true;
                        if (other == null) return false;
                        if (getClass() != other.getClass()) return false;
                        FlowId fid = (FlowId) other;
                        if (this.dcxId.equals(fid.dcxId) && this.trxId.equals(fid.trxId) &&
                                        this.msgType.equals(fid.msgType)) {
                                return true;
                        return false;

I figured that an equals() method would need to be overridden to ensure
comparison of keys, but still entries with the same key are listed
separately after applying a groupByKey(), for example. What further
modifications are necessary to enable usage of above class as a key. Right
now, I have fallen back to using Tuple3<String, String, String> instead of
the FlowId class, but it makes the code unnecessarily verbose.

View this message in context:
Sent from the Apache Spark User List mailing list archive at