crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Injecting alternate PType Converter implementations
Date Wed, 24 Apr 2013 22:29:01 GMT
Hey Micah,

It seems like having the AvroKeyConverter use the AvroKey as the return
type instead of AvroWrapper is the easiest way to solve this, since AvroKey
is a subclass of AvroWrapper. That said, I agree, that's a thorny problem.
We're just getting ready for the 0.6.0 release, but I'd be fine to get the
switch in there if that solved this problem for you.


On Wed, Apr 24, 2013 at 3:23 PM, Micah Whitacre <>wrote:

> As an alternative to the standard AvroInput/OutputFormat, I've been
> playing around with how to support alternate Avro file types like
> Trevni[1], which give benefits when we want to only retrieve a subset of
> the Avro object.
> Picking one of the implementations
> (AvroTrevniKeyInputFormat/AvroTrevniKeyOutputFormat)[2], I implemented the
> various Source/Target/SourceTarget implementations.  When I started trying
> to test it out (to see if I did any of it right), I hit the issue that the
> AvroKeyConverter only produces AvroWrapper objects and the output format
> requires AvroKey.  So I get ClassCastExceptions CrunchOutputs.write(...)
> method.
> Caused by: java.lang.ClassCastException:
> org.apache.avro.mapred.AvroWrapper cannot be cast to
> org.apache.avro.mapred.AvroKey
> at
> org.apache.trevni.avro.mapreduce.AvroTrevniKeyRecordWriter.write(
>  at
> I was hoping that the target would be able to take any PCollection<?
> extends AvroType> but it looks like I'd need to implement my own PType and
> force consumers to use that just to change the converter to produce AvroKey
> instead.
> Is implementing a custom PType the only way to inject an alternate
> converter?  That seems like a high cost on the implementation side and
> forcing a restriction onto others in the pipeline who are generally happy
> with the standard AvroType and shouldn't be burdened with how the data
> might be stored later on in the processing.
> Thoughts?
> [1] -
> [2] -

Director of Data Science
Cloudera <>
Twitter: @josh_wills <>

View raw message