beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré (JIRA) <>
Subject [jira] [Commented] (BEAM-881) Provide a PTransform in IOs providing a "standard" Avro IndexedRecord
Date Wed, 02 Nov 2016 18:06:58 GMT


Jean-Baptiste Onofré commented on BEAM-881:

It's invasive in the sense that each IO would have to add the corresponding PTransform.

1. it's an optional change (not enforced), so it's up to the IO provider to decide to provide
such PTransform or not
2. it doesn't change the IO core code itself: it's a PTransform in a transform package for
3. The user doesn't have to use the PTransform if it doesn't want/need it. On the other hand,
if he wants to use it, he has to explicitly define the PTransform use in his pipeline.

> Provide a PTransform in IOs providing a "standard" Avro IndexedRecord
> ---------------------------------------------------------------------
>                 Key: BEAM-881
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
> Now, each IO is using a different data format. For instance, the {{JmsIO.Read}} provides
a {{PCollection}} of {{JmsRecord}} (and {{JmsIO.Write}} expects also a {{JmsRecord}}), {{KafkaIO.Read}}
provides a {{PCollection}} of {{KafkaRecord}}.
> It could appear a bit "complex" for users to manipulate such kind of data format: some
users may expect kind of standard format.
> Without modifying the existing IO, we could add a {{PTransform}} (as part of the IO)
that an user can optionally use. This transform will convert the IO data format (let say {{JmsRecord}}
for instance) to a standard Avro {{IndexedRecord}}.

This message was sent by Atlassian JIRA

View raw message