samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selina Tech <swucaree...@gmail.com>
Subject Re: Avro vs Protocol buffer for Samza output
Date Thu, 19 Nov 2015 01:53:32 GMT
Hi, Yi:

    I think I got the answer as below:

"The Kafka message format starts with a magic byte indicating what kind of
serialization is used for this message. And if this byte indicates Avro,
you can layout your message as starting with the schemaId and then followed
by message payload. Upon consumption, you can first get the schemaId, query
Avro for the schema given the id, and then use schema to deserialize the
message"
--http://grokbase.com/t/kafka/users/138mdm6tp3/avro-serialization


Thanks again!
Sincerely,
Selina

On Wed, Nov 18, 2015 at 5:43 PM, Selina Tech <swucareer99@gmail.com> wrote:

> Hi, Yi:
>      Thanks for your reply. Do you mean there is no advantage of Avro
> message vs Protocol buffer message on Kafka except  Avro schema registry?
>
>      BTW, do you know how Kafka implement the Avro message? Does each Avro
> message include the schema or not?  The size of Avro message is a big
> concern for me now.
>
> Sincerely,
> Selina
>
>
>
> On Wed, Nov 18, 2015 at 5:29 PM, Yi Pan <nickpan47@gmail.com> wrote:
>
>> Hi, Selina,
>>
>> Samza's producer/consumer is highly tunable. You can configure it to use
>> ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf
>> format. The use of Avro in Kafka is LinkedIn's choice and does not
>> necessarily fit others.
>>
>> For the sake of "why LinkedIn uses Avro", here is the biggest reason:
>> LinkedIn uses Avro schema registry to ensure that producer/consumer are
>> using compatible Avro schema versions. It is a specific way of maintaining
>> compatibility between producer and consumer in LinkedIn. ProtoBuf does not
>> seem to have the schema registry functionality and requires re-compilation
>> to make sure producer and consumer are compatible on the wire-format of
>> the
>> message.
>>
>> If you have other ways to maintain the compatibility between producer and
>> consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in
>> Samza.
>>
>> Best,
>>
>> -Yi
>>
>> On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech <swucareer99@gmail.com>
>> wrote:
>>
>> > Dear All:
>> >
>> >       I need to generate some data by Samza to Kafka and then write to
>> > Parquet formate file.  I was asked why I choose Avro type as my Samza
>> > output to Kafka instead of Protocol Buffer. Since currently our data on
>> > Kafka are all Protocol buffer.
>> >       I explained for Avro encoded message -- The encoded size is
>> smaller,
>> > no extra code compile, implementation easier.  fast to
>> > serialize/deserialize and support a lot language.  However some people
>> > believe when encoded the Avro message take as much space as Protocol
>> > buffer, but with schema, the size could be much bigger.
>> >
>> >       I am wondering if there are any other advantages make you choose
>> Avro
>> > as your message type at Kafka?
>> >
>> > Sincerely,
>> > Selina
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message