PCollection<MyClass> C = Set.difference(A, B);
Here both A and B are PCollection<MyClass> type.
MyClass is defined as follows:
public class MyClass implements java.io.Serializable, Cloneable{
private String a;
private String b;
private int c;
private Map<String, Double> d;
private int e;
public MyClass(){
this(null, null, 0, new HashMap<String, Double>());
}
public MyClass(String labelID, String sampleID, Integer pos_neg_ind, HashMap<String, Double> feat_val_pair){
......
}
public MyClass(String input){
.....
}
.....
}
From running the set difference, I got the following error. Was that because of MyClass including a Map member d? If so, is there another way to generate the set diff by having these inputs?
Thanks!
Lucy
java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while doing final merge
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while doing final merge
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:160)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.avro.AvroRuntimeException: Can't compare maps!
at org.apache.avro.io.BinaryData.compare(BinaryData.java:134)
at org.apache.avro.io.BinaryData.compare(BinaryData.java:139)
at org.apache.avro.io.BinaryData.compare(BinaryData.java:92)
at org.apache.avro.io.BinaryData.compare(BinaryData.java:72)
at org.apache.avro.mapred.AvroKeyComparator.compare(AvroKeyComparator.java:43)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:578)
at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:108)
at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:524)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:539)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:209)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.finalMerge(MergeManagerImpl.java:731)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.close(MergeManagerImpl.java:370)
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:158)
... 7 more