beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <>
Subject [jira] [Commented] (BEAM-1573) KafkaIO does not allow using Kafka serializers and deserializers
Date Mon, 06 Mar 2017 09:18:33 GMT


Raghu Angadi commented on BEAM-1573:

There are two levels of solutions to deserializer (and serializer): 
  # Reasonable ways to use of custom Kafka deserializers & serializers
    * This is very feasible now, including the case when you are reading from multiple topics.
  # Update to KafkaIO API to pass Kafka serializers directly to the Kafka consumer.
     * We might end up doing this, not exactly how you proposed, but rather replacing coders
with Kafka (de)serializers. There is no need to include both I think. 
     * There is a discussion on Beam mailing lists about removing use of coders directly in
sources and other places and that might be right time to add this support. (cc [~jkff])

Are you more interested 1 or 2? 

One way to use any Kafka serializer (for (1)): 
PCollection<KafkaRecord<byte[], byte[]> kafkaRecords = // Note that KafkaRecord include
topic name, partition etc.
    .apply(KafkaIO.<byte[] >read()

kafkaRecords.apply( ParDo.of(new DoFn<KafkaRecord<byte[], byte[]>, MyAvroRecord)
   private final Map<String, Object> config = // config 
   private transient Deserializer kafkaDeserializer;
   public void setup() {
      kafkaDeserializer = new MyDeserializer();
     kafkaDeserializer.configure(config) // kafka config (serializable map)

    public void procesElement(Context context) {
       MyAvroRecord record = kafkaDeserializer.deserialize(context.element().getTopic(), context.element().getValue())
   public void tearDown() {


> KafkaIO does not allow using Kafka serializers and deserializers
> ----------------------------------------------------------------
>                 Key: BEAM-1573
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-extensions
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: peay
>            Assignee: Raghu Angadi
>            Priority: Minor
> KafkaIO does not allow to override the serializer and deserializer settings of the Kafka
consumer and producers it uses internally. Instead, it allows to set a `Coder`, and has a
simple Kafka serializer/deserializer wrapper class that calls the coder.
> I appreciate that allowing to use Beam coders is good and consistent with the rest of
the system. However, is there a reason to completely disallow to use custom Kafka serializers
> This is a limitation when working with an Avro schema registry for instance, which requires
custom serializers. One can write a `Coder` that wraps a custom Kafka serializer, but that
means two levels of un-necessary wrapping.
> In addition, the `Coder` abstraction is not equivalent to Kafka's `Serializer` which
gets the topic name as input. Using a `Coder` wrapper would require duplicating the output
topic setting in the argument to `KafkaIO` and when building the wrapper, which is not elegant
and error prone.

This message was sent by Atlassian JIRA

View raw message