samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Casillas <luis.casil...@progressfin.com>
Subject Re: Sample code or tutorial for writing/reading Avro type message in Samza
Date Fri, 20 Nov 2015 20:09:05 GMT

We haven’t seriously considered Protocol Buffers.  In general the tools we’re interested
in have better support for Avro than for protobuf; Avro was designed for storing data in big-data
storage like HDFS, and many tools for analyzing such data have taken it up.  For example Hive
comes with Avro support built in.

More generally, we like the design choices that Avro has made:

1. Self-describing container files
2. Easy convertibility to/from JSON
3. Not tightly tied to code generation



We’ve experienced these downsides, however:

1. We’ve been bit hard by buggy Avro library versions.  You want to stick to the latest
one.
2. Hadoop ships with such an older, buggy version of Avro, and it is a major pain to work
around it.
3. Avro's “one definition file = one schema = one record type” assumption causes us some
trouble.


On 11/20/15, 2:47 AM, "Selina Tech" <swucareer99@gmail.com> wrote:

>Hi, Luis:
>        Thanks a lot for your detail reply with your codes and link of  Avro
>schema registry.
>        May I have a question, have you considered protocol buffer as your
>message type?
>
>Sincerely,
>Selina
>
>
>On Thu, Nov 19, 2015 at 2:22 PM, Luis Casillas <
>luis.casillas@progressfin.com> wrote:
>
>>
>> I did a Samza proof of concept project recently and I ended up writing
>> this code:
>>
>> https://gist.github.com/ldcasillas-progreso/871af3c1a1790be975fd
>>
>> In the end, however, I switched the project from Avro to JSON.  The issue
>> is that Avro is designed to work with its self-describing container file
>> format, which embeds the schema used to write the records in the file.
>> Avro’s schema evolution features rely on this embedded schema; when the
>> embedded schema and the reader’s schema are not equal, Avro uses its
>> special rules to translate the old data to the new schema.
>>
>> But when you’re working with Kafka/Samza, there is no container file.
>> Therefore, none of the schema evolution tools work.  Therefore, if you
>> change your Avro schema, you likely won’t be able to read any of the old
>> messages again.
>>
>> There’s a Kafka Avro schema registry project that aims to fix this:
>>
>> https://github.com/confluentinc/schema-registry
>>
>> I tried it but the released version just was not mature enough—which is
>> why I ended up using JSON.  But I did write a Serde that encodes/decodes
>> the Avro objects in JSON:
>>
>> https://gist.github.com/ldcasillas-progreso/3611d40d2833aa62c1b3
>>
>> Hope this helps.
>>
>>
>>
>>
>>
>> On 11/17/15, 12:32 AM, "Selina Tech" <swucareer99@gmail.com> wrote:
>>
>> >Dear All:
>> >     Do you know where I can find the tutorial or sample code for writing
>> >Avro type message to Kafka and reading Avro type message from Kafka in
>> >Samza?
>> >      I am wondering how should I serialized GenericRecord to byte and
>> >deserialized it?
>> >     Your comments/suggestion are highly appreciated.
>> >
>> >Sincerely,
>> >Selina
>>
>>
>> -----------
>> This message and any files or text attached to it are intended only for
>> the recipients named above, and contain information that is confidential or
>> privileged. If you are not an intended recipient, you must not read, copy,
>> use or disclose this communication. Please also notify the sender by
>> replying to this message, and then delete all copies of it from your system.
>>
>> Este mensaje y cualquier archivo o texto adjunto es dirigido solamente a
>> los destinatarios especificados en el encabezado y contiene información
>> confidencial y/o privilegiada. Si usted no es el destinatario no deberá
>> leer, copiar, usar o divulgar el contenido. Por favor notifique al
>> remitente, respondiendo a esté mensaje y elimine todas las copias del mismo
>> de su sistema.
>>


-----------
This message and any files or text attached to it are intended only for the recipients named
above, and contain information that is confidential or privileged. If you are not an intended
recipient, you must not read, copy, use or disclose this communication. Please also notify
the sender by replying to this message, and then delete all copies of it from your system.

Este mensaje y cualquier archivo o texto adjunto es dirigido solamente a los destinatarios
especificados en el encabezado y contiene información confidencial y/o privilegiada. Si usted
no es el destinatario no deberá leer, copiar, usar o divulgar el contenido. Por favor notifique
al remitente, respondiendo a esté mensaje y elimine todas las copias del mismo de su sistema.
Mime
View raw message