crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <>
Subject Injecting alternate PType Converter implementations
Date Wed, 24 Apr 2013 22:23:51 GMT
As an alternative to the standard AvroInput/OutputFormat, I've been playing
around with how to support alternate Avro file types like Trevni[1], which
give benefits when we want to only retrieve a subset of the Avro object.

Picking one of the implementations
(AvroTrevniKeyInputFormat/AvroTrevniKeyOutputFormat)[2], I implemented the
various Source/Target/SourceTarget implementations.  When I started trying
to test it out (to see if I did any of it right), I hit the issue that the
AvroKeyConverter only produces AvroWrapper objects and the output format
requires AvroKey.  So I get ClassCastExceptions CrunchOutputs.write(...)

Caused by: java.lang.ClassCastException: org.apache.avro.mapred.AvroWrapper
cannot be cast to org.apache.avro.mapred.AvroKey

I was hoping that the target would be able to take any PCollection<?
extends AvroType> but it looks like I'd need to implement my own PType and
force consumers to use that just to change the converter to produce AvroKey

Is implementing a custom PType the only way to inject an alternate
converter?  That seems like a high cost on the implementation side and
forcing a restriction onto others in the pipeline who are generally happy
with the standard AvroType and shouldn't be burdened with how the data
might be stored later on in the processing.


[1] -
[2] -

View raw message