beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-2536) Simplify specifying coders on PCollectionTuple
Date Wed, 28 Jun 2017 21:08:00 GMT
Eugene Kirpichov created BEAM-2536:
--------------------------------------

             Summary: Simplify specifying coders on PCollectionTuple
                 Key: BEAM-2536
                 URL: https://issues.apache.org/jira/browse/BEAM-2536
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov


Currently when using a multi-output ParDo, the user usually has to do one of the following:

1) Use anonymous class: new TupleTag<Foo>() {} - in order to reify the Foo type and
make coder inference work. In this case, a frequent problem is that the anonymous class captures
a large enclosing class, and either doesn't serialize at all, or at least serializes to something
bulky.
2) Explicitly do tuple.get(myTag).setCoder(...)

Both of these are suboptimal.

Could we have e.g. a constructor for TupleTag that explicitly takes a TypeDescriptor? Or even
a Coder? Or a family of factory methods for TupleTagList that take these? E.g.:
in.apply(ParDo.of(...).withOutputTags(mainTag, TupleTagList.of(side1, FooCoder.of()).and(side2,
BarCoder.of()));

I would suggest both: TupleTag constructor should optionally take a TypeDescriptor; and TupleTagList.of()
and .and() should optionally take a Coder.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message