samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: Avro vs Protocol buffer for Samza output
Date Thu, 19 Nov 2015 01:29:38 GMT
Hi, Selina,

Samza's producer/consumer is highly tunable. You can configure it to use
ProtocolBufferSerde class if your messages in Kafka are in ProtocolBuf
format. The use of Avro in Kafka is LinkedIn's choice and does not
necessarily fit others.

For the sake of "why LinkedIn uses Avro", here is the biggest reason:
LinkedIn uses Avro schema registry to ensure that producer/consumer are
using compatible Avro schema versions. It is a specific way of maintaining
compatibility between producer and consumer in LinkedIn. ProtoBuf does not
seem to have the schema registry functionality and requires re-compilation
to make sure producer and consumer are compatible on the wire-format of the
message.

If you have other ways to maintain the compatibility between producer and
consumers using ProtoBuf, I don't see why you cannot use ProtoBuf in Samza.

Best,

-Yi

On Wed, Nov 18, 2015 at 3:43 PM, Selina Tech <swucareer99@gmail.com> wrote:

> Dear All:
>
>       I need to generate some data by Samza to Kafka and then write to
> Parquet formate file.  I was asked why I choose Avro type as my Samza
> output to Kafka instead of Protocol Buffer. Since currently our data on
> Kafka are all Protocol buffer.
>       I explained for Avro encoded message -- The encoded size is smaller,
> no extra code compile, implementation easier.  fast to
> serialize/deserialize and support a lot language.  However some people
> believe when encoded the Avro message take as much space as Protocol
> buffer, but with schema, the size could be much bigger.
>
>       I am wondering if there are any other advantages make you choose Avro
> as your message type at Kafka?
>
> Sincerely,
> Selina
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message