samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selina Tech <>
Subject Re: Avro vs Protocol buffer for Samza output
Date Thu, 19 Nov 2015 01:43:15 GMT
Hi, Yi:
     Thanks for your reply. Do you mean there is no advantage of Avro
message vs Protocol buffer message on Kafka except  Avro schema registry?

     BTW, do you know how Kafka implement the Avro message? Does each Avro
message include the schema or not?  The size of Avro message is a big
concern for me now.


On Wed, Nov 18, 2015 at 5:29 PM, Yi Pan <> wrote:

> Hi, Selina,
> Samza's producer/consumer is highly tunable. You can configure it to use
> ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf
> format. The use of Avro in Kafka is LinkedIn's choice and does not
> necessarily fit others.
> For the sake of "why LinkedIn uses Avro", here is the biggest reason:
> LinkedIn uses Avro schema registry to ensure that producer/consumer are
> using compatible Avro schema versions. It is a specific way of maintaining
> compatibility between producer and consumer in LinkedIn. ProtoBuf does not
> seem to have the schema registry functionality and requires re-compilation
> to make sure producer and consumer are compatible on the wire-format of the
> message.
> If you have other ways to maintain the compatibility between producer and
> consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in Samza.
> Best,
> -Yi
> On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech <>
> wrote:
> > Dear All:
> >
> >       I need to generate some data by Samza to Kafka and then write to
> > Parquet formate file.  I was asked why I choose Avro type as my Samza
> > output to Kafka instead of Protocol Buffer. Since currently our data on
> > Kafka are all Protocol buffer.
> >       I explained for Avro encoded message -- The encoded size is
> smaller,
> > no extra code compile, implementation easier.  fast to
> > serialize/deserialize and support a lot language.  However some people
> > believe when encoded the Avro message take as much space as Protocol
> > buffer, but with schema, the size could be much bigger.
> >
> >       I am wondering if there are any other advantages make you choose
> Avro
> > as your message type at Kafka?
> >
> > Sincerely,
> > Selina
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message