I noticed that MRUnit uses the same Serialization for adding data as input and as output. This works fine if the key and value types are Writable.
But I currently used avro based types (ex. AvroKey<Long> ) by following the example of http://stackoverflow.com/questions/15230482/mrunit-with-avro-nullpointerexception-in-serialization .
This works fine if either input or the output are avro types. But if both, input and output are avro based, only one schema can be used ( the one registered in “avro.serialization.key.writer.schema” and “avro.serialization.value.writer.schema”).
So if e.g. you have a mapper Mapper<AvroKey<String>, AvroValue<String>,AvroKey<String>,AvroValue<Long>> the output serialization will fail since it tries to decode a Long as a String.
Normally using Avro the input and output schemas are given by the properties "avro.schema.input.key" and "avro.schema.output.key" .
The AvroSerialization class from avro only uses the “avro.serialization.key.writer.schema” schemas. Input and output is done by the AvroKeyValueInputFormat / OutputFormat.
Adding a custom AvroSerialization that takes the schemas class to "io.serializations" would not solve the problem since both classes would accept AvroKey and AvroValue.
Are you planing to include Avro support into the MRUnit API? Other than the workaround defining the schemas as config properties?
(e.g. a method withAvroKeyInputSchema(Schema schema) )
I would suggest to provide an API for avro and to switch out “avro.serialization.key.writer.schema” by the according input/output schemas when adding new values. This way only the addInput() and addOutput() methods have to be changed. Although this is a hack.
Do you have suggestions that provide avro support in a cleaner way without huge effort?
Would you be interested to integrate avro support into MRUnit?
Do you have any recommendations on how to proceed?
Should I create a JIRA issue and suggest an implementation?