mrunit-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: MRUnit Avro support
Date Tue, 17 Dec 2013 18:40:18 GMT
I think to get around that you use a separate configuration for output?

http://mrunit.apache.org/documentation/javadocs/1.0.0/org/apache/hadoop/mrunit/TestDriver.html#withOutputSerializationConfiguration(org.apache.hadoop.conf.Configuration)


On Tue, Dec 17, 2013 at 10:48 AM, Florian Froese <f.froese@gmail.com> wrote:

> The problem starts when adding values to the driver.
> It is with the copy() method in TestDriver.java .
> Since there is only one Serialization for input and output, only one
> schema can be defined for key and value.
> Thus if there are different avroschemas for input and output an error is
> thrown since it cannot serialize the object with the wrong schema.
>
> Cheers
> Florian
>
>
>
> On 17.12.2013, at 17:23, Brock Noland <brock@cloudera.com> wrote:
>
> I have added an answer to SO:
> http://stackoverflow.com/questions/15230482/mrunit-with-avro-nullpointerexception-in-serialization/20639370#20639370
>
> Please checkout this JIRA:
> https://issues.apache.org/jira/browse/MRUNIT-181 and let me know if that
> works for you.
>
> Cheers!
>
>
> On Tue, Dec 17, 2013 at 9:25 AM, Florian Froese <f.froese@gmail.com>wrote:
>
>> Hey guys!
>> I noticed that MRUnit uses the same Serialization for adding data as
>> input and as output. This works fine if the key and value types are
>> Writable.
>> But I currently used avro based types (ex. AvroKey<Long> ) by following
>> the example of
>> http://stackoverflow.com/questions/15230482/mrunit-with-avro-nullpointerexception-in-serialization
>>  .
>> This works fine if either input or the output are avro types. But if
>> both, input and output are avro based, only one schema can be used ( the
>> one registered in “avro.serialization.key.writer.schema” and
>> “avro.serialization.value.writer.schema”).
>>
>> So if e.g. you have a mapper Mapper<AvroKey<String>,
>> AvroValue<String>,AvroKey<String>,AvroValue<Long>> the output
serialization
>> will fail since it tries to decode a Long as a String.
>>
>> Normally using Avro the input and output schemas are given by the
>> properties "avro.schema.input.key" and "avro.schema.output.key" .
>> The AvroSerialization class from avro only uses the
>> “avro.serialization.key.writer.schema” schemas. Input and output is done by
>> the AvroKeyValueInputFormat / OutputFormat.
>>
>> Adding a custom AvroSerialization that takes the schemas class to
>> "io.serializations" would not solve the problem since both classes would
>> accept AvroKey and AvroValue.
>>
>> Are you planing to include Avro support into the MRUnit API? Other than
>> the workaround defining the schemas as config properties?
>> (e.g. a  method withAvroKeyInputSchema(Schema schema) )
>>
>> I would suggest to provide an API for avro and to switch out
>> “avro.serialization.key.writer.schema” by the according input/output
>> schemas when adding new values. This way only the addInput() and
>> addOutput() methods have to be changed. Although this is a hack.
>> Do you have suggestions that provide avro support in a cleaner way
>> without huge effort?
>>
>> Would you be interested to integrate avro support into MRUnit?
>> Do you have any recommendations on how to proceed?
>> Should I create a JIRA issue and suggest an implementation?
>>
>> Best regards
>> Florian
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Mime
View raw message