beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francis UPTON (JIRA)" <>
Subject [jira] [Commented] (BEAM-881) Provide a PTransform in IOs providing a "standard" Avro IndexedRecord
Date Wed, 02 Nov 2016 16:43:59 GMT


Francis UPTON commented on BEAM-881:

That's fine; though I don't understand why this is considered an invasive change. Each IO
can be accompanied by a PTransform class that does the conversion. The name of this class
can be based on the name of the I/O class, and it can be located in the same place. That's
it; there is no change to Beam itself. It would just be a convention for I/O developers. No
one would be forced to use these PTransforms, but they would be available to anyone who wanted

Avro (and other like formats) are good at this sort of thing, so I don't understand why this
would not be considered "a really useful tuple type". Can you explain what addition things
you would like to see (or not see)?

> Provide a PTransform in IOs providing a "standard" Avro IndexedRecord
> ---------------------------------------------------------------------
>                 Key: BEAM-881
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
> Now, each IO is using a different data format. For instance, the {{JmsIO.Read}} provides
a {{PCollection}} of {{JmsRecord}} (and {{JmsIO.Write}} expects also a {{JmsRecord}}), {{KafkaIO.Read}}
provides a {{PCollection}} of {{KafkaRecord}}.
> It could appear a bit "complex" for users to manipulate such kind of data format: some
users may expect kind of standard format.
> Without modifying the existing IO, we could add a {{PTransform}} (as part of the IO)
that an user can optionally use. This transform will convert the IO data format (let say {{JmsRecord}}
for instance) to a standard Avro {{IndexedRecord}}.

This message was sent by Atlassian JIRA

View raw message