crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <>
Subject Re: Injecting alternate PType Converter implementations
Date Wed, 24 Apr 2013 22:49:20 GMT
Is the change simply:

  private AvroWrapper<K> getWrapper() {
    if (wrapper == null) {
      // wrapper = new AvroWrapper<K>();
      wrapper = new AvroKey<K>();
    return wrapper;

Or are there more changes I might be missing?  Doing that got me past the
ClassCastException (though still trying to get my code working).

As I indicated I'm still just trying to prove out my code and if it pans
out we can probably wait till the 0.7.0 release (assuming the current ~2
month release cycle).  I'll leave it to you to evaluate the risk.

I'm guessing the injecting a converter issue will be more significant if I
try out the other Trevni format[1] where I'd need the converter to support
AvroValue instead of NullWritable.  So I'm fine with holding off a rushed
change before a release in lieu of a more holistic solution to both parts.

[1] -

On Wed, Apr 24, 2013 at 5:29 PM, Josh Wills <> wrote:

> Hey Micah,
> It seems like having the AvroKeyConverter use the AvroKey as the return
> type instead of AvroWrapper is the easiest way to solve this, since AvroKey
> is a subclass of AvroWrapper. That said, I agree, that's a thorny problem.
> We're just getting ready for the 0.6.0 release, but I'd be fine to get the
> switch in there if that solved this problem for you.
> J
> On Wed, Apr 24, 2013 at 3:23 PM, Micah Whitacre <>wrote:
>> As an alternative to the standard AvroInput/OutputFormat, I've been
>> playing around with how to support alternate Avro file types like
>> Trevni[1], which give benefits when we want to only retrieve a subset of
>> the Avro object.
>> Picking one of the implementations
>> (AvroTrevniKeyInputFormat/AvroTrevniKeyOutputFormat)[2], I implemented the
>> various Source/Target/SourceTarget implementations.  When I started trying
>> to test it out (to see if I did any of it right), I hit the issue that the
>> AvroKeyConverter only produces AvroWrapper objects and the output format
>> requires AvroKey.  So I get ClassCastExceptions CrunchOutputs.write(...)
>> method.
>> Caused by: java.lang.ClassCastException:
>> org.apache.avro.mapred.AvroWrapper cannot be cast to
>> org.apache.avro.mapred.AvroKey
>> at
>> org.apache.trevni.avro.mapreduce.AvroTrevniKeyRecordWriter.write(
>>  at
>> I was hoping that the target would be able to take any PCollection<?
>> extends AvroType> but it looks like I'd need to implement my own PType and
>> force consumers to use that just to change the converter to produce AvroKey
>> instead.
>> Is implementing a custom PType the only way to inject an alternate
>> converter?  That seems like a high cost on the implementation side and
>> forcing a restriction onto others in the pipeline who are generally happy
>> with the standard AvroType and shouldn't be burdened with how the data
>> might be stored later on in the processing.
>> Thoughts?
>> [1] -
>> [2] -
> --
> Director of Data Science
> Cloudera <>
> Twitter: @josh_wills <>

View raw message