Hi,

     I am trying to do Set difference as follows:

PCollection<MyClass> C = Set.difference(A, B);


Here both A and B are PCollection<MyClass> type. 


MyClass is defined as follows:


public class MyClass implements java.io.Serializable, Cloneable{

private String a;

private String b;

private int c;

private Map<String, Double> d;

private int e;

public MyClass(){

this(null, null, 0, new HashMap<String, Double>());

}

public MyClass(String labelID, String sampleID, Integer pos_neg_ind, HashMap<String, Double> feat_val_pair){ 

......

        }

        public MyClass(String input){

         .....

         }

         .....

}


      From running the set difference, I got the following error. Was that because of MyClass including a Map member d? If so, is there another way to generate the set diff by having these inputs?


      Thanks!


Lucy


java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while doing final merge 

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)

Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while doing final merge 

at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:160)

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)

at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)

Caused by: org.apache.avro.AvroRuntimeException: Can't compare maps!

at org.apache.avro.io.BinaryData.compare(BinaryData.java:134)

at org.apache.avro.io.BinaryData.compare(BinaryData.java:139)

at org.apache.avro.io.BinaryData.compare(BinaryData.java:92)

at org.apache.avro.io.BinaryData.compare(BinaryData.java:72)

at org.apache.avro.mapred.AvroKeyComparator.compare(AvroKeyComparator.java:43)

at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:578)

at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)

at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:108)

at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:524)

at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:539)

at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:209)

at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.finalMerge(MergeManagerImpl.java:731)

at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.close(MergeManagerImpl.java:370)

at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:158)

... 7 more