kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <static.void....@gmail.com>
Subject Re: Avro serialization
Date Tue, 20 Aug 2013 18:09:21 GMT
Thanks Jay I've already read the paper and Jira ticket (haven't read the code) but I'm still
confused on how to integrate this with Kafka. 

Say we write an Avro message (the message contains a SHA of the shcmea) to Kafka and a consumer
pulls of this message. How does the consume know how to deserialize the message to even be
able to get to the SHA to look up the full schema. Would this require wrapping all messages
in another type of message like JSON { hash:  <16 bytes>, message: <Avro encoded
message in bytes> }

On Aug 20, 2013, at 9:33 AM, Jay Kreps <jay.kreps@gmail.com> wrote:

> This paper has more information on what we are doing at LinkedIn:
> http://sites.computer.org/debull/A12june/pipeline.pdf
> This Avro JIRA has a schema repository implementation similar to the one
> LinkedIn uses:
> https://issues.apache.org/jira/browse/AVRO-1124
> -Jay
> On Tue, Aug 20, 2013 at 7:08 AM, Mark <static.void.dev@gmail.com> wrote:
>> Can someone break down how message serialization would work with Avro?
>> I've read instead of adding a schema to every single event it would be wise
>> to add some sort of fingerprint with each message to identify which schema
>> it should used. What I'm having trouble understanding is, how do we read
>> the fingerprint without a schema? Don't we need the schema to deserialize?
>> Same question goes for working with Hadoop.. how does the input format
>> know which schema to use?
>> Thanks

View raw message