Eugene Kirpichov created BEAM-2536:
--------------------------------------
Summary: Simplify specifying coders on PCollectionTuple
Key: BEAM-2536
URL: https://issues.apache.org/jira/browse/BEAM-2536
Project: Beam
Issue Type: Bug
Components: sdk-java-core
Reporter: Eugene Kirpichov
Currently when using a multi-output ParDo, the user usually has to do one of the following:
1) Use anonymous class: new TupleTag<Foo>() {} - in order to reify the Foo type and
make coder inference work. In this case, a frequent problem is that the anonymous class captures
a large enclosing class, and either doesn't serialize at all, or at least serializes to something
bulky.
2) Explicitly do tuple.get(myTag).setCoder(...)
Both of these are suboptimal.
Could we have e.g. a constructor for TupleTag that explicitly takes a TypeDescriptor? Or even
a Coder? Or a family of factory methods for TupleTagList that take these? E.g.:
in.apply(ParDo.of(...).withOutputTags(mainTag, TupleTagList.of(side1, FooCoder.of()).and(side2,
BarCoder.of()));
I would suggest both: TupleTag constructor should optionally take a TypeDescriptor; and TupleTagList.of()
and .and() should optionally take a Coder.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
|