mrunit-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: MRUnit Avro support
Date Tue, 17 Dec 2013 16:23:02 GMT
I have added an answer to SO:
http://stackoverflow.com/questions/15230482/mrunit-with-avro-nullpointerexception-in-serialization/20639370#20639370

Please checkout this JIRA:
https://issues.apache.org/jira/browse/MRUNIT-181and let me know if
that works for you.

Cheers!


On Tue, Dec 17, 2013 at 9:25 AM, Florian Froese <f.froese@gmail.com> wrote:

> Hey guys!
> I noticed that MRUnit uses the same Serialization for adding data as input
> and as output. This works fine if the key and value types are Writable.
> But I currently used avro based types (ex. AvroKey<Long> ) by following
> the example of
> http://stackoverflow.com/questions/15230482/mrunit-with-avro-nullpointerexception-in-serialization
>  .
> This works fine if either input or the output are avro types. But if both,
> input and output are avro based, only one schema can be used ( the one
> registered in “avro.serialization.key.writer.schema” and
> “avro.serialization.value.writer.schema”).
>
> So if e.g. you have a mapper Mapper<AvroKey<String>,
> AvroValue<String>,AvroKey<String>,AvroValue<Long>> the output serialization
> will fail since it tries to decode a Long as a String.
>
> Normally using Avro the input and output schemas are given by the
> properties "avro.schema.input.key" and "avro.schema.output.key" .
> The AvroSerialization class from avro only uses the
> “avro.serialization.key.writer.schema” schemas. Input and output is done by
> the AvroKeyValueInputFormat / OutputFormat.
>
> Adding a custom AvroSerialization that takes the schemas class to
> "io.serializations" would not solve the problem since both classes would
> accept AvroKey and AvroValue.
>
> Are you planing to include Avro support into the MRUnit API? Other than
> the workaround defining the schemas as config properties?
> (e.g. a  method withAvroKeyInputSchema(Schema schema) )
>
> I would suggest to provide an API for avro and to switch out
> “avro.serialization.key.writer.schema” by the according input/output
> schemas when adding new values. This way only the addInput() and
> addOutput() methods have to be changed. Although this is a hack.
> Do you have suggestions that provide avro support in a cleaner way without
> huge effort?
>
> Would you be interested to integrate avro support into MRUnit?
> Do you have any recommendations on how to proceed?
> Should I create a JIRA issue and suggest an implementation?
>
> Best regards
> Florian
>



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Mime
View raw message