beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-2993) AvroIO.write without specifying a schema
Date Fri, 29 Sep 2017 18:09:00 GMT


Eugene Kirpichov commented on BEAM-2993:

The error message says that your inner class AvroIOTransformTest$AvroIOWriteTransformTest
is not Serializable - and indeed it isn't. To debug serialization issues, you can run the
JVM with and it will tell you exactly what is
the path from a top-level object that needs to be serialized, to the value that is not serializable.

It's most likely cause GenericRecordAvroDestinations is not declared as static, so it's a
regular inner class and captures the enclosing Test class.

Other than that: I don't quite understand this example. In the example, you definitely already
have the schema available via "SCHEMA". I mean - I understand what your example does, but
I don't see how it motivates the need for a schemaless write(), because in this example the
schema is known, and I'm having a hard time coming up with an example where it wouldn't be

> AvroIO.write without specifying a schema
> ----------------------------------------
>                 Key: BEAM-2993
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
> Similarly to, we should be able to write
to avro files using {{AvroIO}} without specifying a schema at build time. Consider the following
use case: a user has a {{PCollection<GenericRecord>}}  but the schema is only known
while running the pipeline.  {{AvroIO.writeGenericRecords}} needs the schema, but the schema
is already available in {{GenericRecord}}. We should be able to call {{AvroIO.writeGenericRecords()}}
with no schema.

This message was sent by Atlassian JIRA

View raw message