samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <>
Subject Re: Joining Avro records
Date Thu, 09 Apr 2015 19:07:23 GMT
Hi, Roger,

Good question on that. I am actually not aware of any "automatic" way of
doing this in Avro. I have tried to add generic Schema and Data interface
in samza-sql branch to address the morphing of the schemas from input
streams to the output streams. The basic idea is to have wrapper Schema and
Data classes on-top-of the deserialized objects to access the data fields
according to the schema w/o changing and copying the actual data fields.
Hence, when there is a need to morph the input data schemas into a new
output data schema, we just need an implementation of the new output data
Schema class that can read the corresponding data fields from the input
data and write them out in the output schema. An interface function
transform() is added in the Schema class for this exact purpose. Currently,
it only takes one input data and one example of "projection" transformation
can be found in the implementation of AvroSchema class. A join case as you
presented may well be a reason to have an implementation of "join" with
multiple input data.

All the above solution is still experimental and please feel free to
provide your feedback and comments on that. If we agree that this solution
is good and suit for a broader use case, it can be considered to be used
outside the "SQL" context as well.

Best regards!


On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover <> wrote:

> Hi Milinda and others,
> This is an Avro question but since you guys are working on Avro support for
> stream SQL, I thought I'd ask you for help.
> If I have a two records of type A and B as below and want to join them
> similar to "SELECT *" in SQL to produce a record of type AB, is there an
> simple way to do this with Avro without writing code to copy each field
> individually?
> I appreciate any help.
> Thanks,
> Roger
> {
>   "name": "A",
>   "type": "record",
>   "namespace": "fubar",
>   "fields": [{"name": "a", "type" : "int"}]
> }
> {
>   "name": "B",
>   "type": "record",
>   "namespace": "fubar",
>   "fields": [{"name": "b", "type" : "int"}]
> }
> {
>   "name": "AB",
>   "type": "record",
>   "namespace": "fubar",
>   "fields": [{"name": "a", "type" : "int"}, {"name": "b", "type" : "int"}]
> }

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message