mrunit-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florian Froese <f.fro...@gmail.com>
Subject Re: MRUnit Avro support
Date Tue, 17 Dec 2013 16:48:45 GMT
The problem starts when adding values to the driver.
It is with the copy() method in TestDriver.java .
Since there is only one Serialization for input and output, only one schema can be defined
for key and value.
Thus if there are different avroschemas for input and output an error is thrown since it cannot
serialize the object with the wrong schema.

Cheers
Florian


On 17.12.2013, at 17:23, Brock Noland <brock@cloudera.com> wrote:

> I have added an answer to SO: http://stackoverflow.com/questions/15230482/mrunit-with-avro-nullpointerexception-in-serialization/20639370#20639370
> 
> Please checkout this JIRA: https://issues.apache.org/jira/browse/MRUNIT-181 and let me
know if that works for you.
> 
> Cheers!
> 
> 
> On Tue, Dec 17, 2013 at 9:25 AM, Florian Froese <f.froese@gmail.com> wrote:
> Hey guys!
> I noticed that MRUnit uses the same Serialization for adding data as input and as output.
This works fine if the key and value types are Writable. 
> But I currently used avro based types (ex. AvroKey<Long> ) by following the example
of http://stackoverflow.com/questions/15230482/mrunit-with-avro-nullpointerexception-in-serialization
.
> This works fine if either input or the output are avro types. But if both, input and
output are avro based, only one schema can be used ( the one registered in “avro.serialization.key.writer.schema”
and “avro.serialization.value.writer.schema”). 
> 
> So if e.g. you have a mapper Mapper<AvroKey<String>, AvroValue<String>,AvroKey<String>,AvroValue<Long>>
the output serialization will fail since it tries to decode a Long as a String. 
> 
> Normally using Avro the input and output schemas are given by the properties "avro.schema.input.key"
and "avro.schema.output.key" .
> The AvroSerialization class from avro only uses the “avro.serialization.key.writer.schema”
schemas. Input and output is done by the AvroKeyValueInputFormat / OutputFormat.
> 
> Adding a custom AvroSerialization that takes the schemas class to "io.serializations"
would not solve the problem since both classes would accept AvroKey and AvroValue.
> 
> Are you planing to include Avro support into the MRUnit API? Other than the workaround
defining the schemas as config properties?
> (e.g. a  method withAvroKeyInputSchema(Schema schema) )
> 
> I would suggest to provide an API for avro and to switch out “avro.serialization.key.writer.schema”
by the according input/output schemas when adding new values. This way only the addInput()
and addOutput() methods have to be changed. Although this is a hack. 
> Do you have suggestions that provide avro support in a cleaner way without huge effort?
> 
> Would you be interested to integrate avro support into MRUnit?
> Do you have any recommendations on how to proceed?
> Should I create a JIRA issue and suggest an implementation?
> 
> Best regards
> Florian
> 
> 
> 
> -- 
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org


Mime
View raw message