beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema
Date Fri, 29 Sep 2017 18:09:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186163#comment-16186163
] 

Eugene Kirpichov commented on BEAM-2993:
----------------------------------------

The error message says that your inner class AvroIOTransformTest$AvroIOWriteTransformTest
is not Serializable - and indeed it isn't. To debug serialization issues, you can run the
JVM with -Dsun.io.serialization.extendedDebugInfo=true and it will tell you exactly what is
the path from a top-level object that needs to be serialized, to the value that is not serializable.

It's most likely cause GenericRecordAvroDestinations is not declared as static, so it's a
regular inner class and captures the enclosing Test class.

Other than that: I don't quite understand this example. In the example, you definitely already
have the schema available via "SCHEMA". I mean - I understand what your example does, but
I don't see how it motivates the need for a schemaless write(), because in this example the
schema is known, and I'm having a hard time coming up with an example where it wouldn't be
known.

> AvroIO.write without specifying a schema
> ----------------------------------------
>
>                 Key: BEAM-2993
>                 URL: https://issues.apache.org/jira/browse/BEAM-2993
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>
> Similarly to https://issues.apache.org/jira/browse/BEAM-2677, we should be able to write
to avro files using {{AvroIO}} without specifying a schema at build time. Consider the following
use case: a user has a {{PCollection<GenericRecord>}}  but the schema is only known
while running the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the schema
is already available in {{GenericRecord}}. We should be able to call {{AvroIO.writeGenericRecords()}}
with no schema.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message