samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Hoover <roger.hoo...@gmail.com>
Subject Re: Joining Avro records
Date Thu, 09 Apr 2015 19:54:35 GMT
Yi Pan,

Thanks for your response.  I'm thinking that I'll iterate over the fields
of the input schemas (similar to this
https://github.com/apache/samza/blob/samza-sql/samza-sql/src/main/java/org/apache/samza/sql/metadata/AvroSchemaConverter.java#L58-L62),
match them up with the output schema and then copy the values.  It'll let
you know how it goes in case it's useful.

Cheers,

Roger

On Thu, Apr 9, 2015 at 12:07 PM, Yi Pan <nickpan47@gmail.com> wrote:

> Hi, Roger,
>
> Good question on that. I am actually not aware of any "automatic" way of
> doing this in Avro. I have tried to add generic Schema and Data interface
> in samza-sql branch to address the morphing of the schemas from input
> streams to the output streams. The basic idea is to have wrapper Schema and
> Data classes on-top-of the deserialized objects to access the data fields
> according to the schema w/o changing and copying the actual data fields.
> Hence, when there is a need to morph the input data schemas into a new
> output data schema, we just need an implementation of the new output data
> Schema class that can read the corresponding data fields from the input
> data and write them out in the output schema. An interface function
> transform() is added in the Schema class for this exact purpose. Currently,
> it only takes one input data and one example of "projection" transformation
> can be found in the implementation of AvroSchema class. A join case as you
> presented may well be a reason to have an implementation of "join" with
> multiple input data.
>
> All the above solution is still experimental and please feel free to
> provide your feedback and comments on that. If we agree that this solution
> is good and suit for a broader use case, it can be considered to be used
> outside the "SQL" context as well.
>
> Best regards!
>
> -Yi
>
> On Thu, Apr 9, 2015 at 8:55 AM, Roger Hoover <roger.hoover@gmail.com>
> wrote:
>
> > Hi Milinda and others,
> >
> > This is an Avro question but since you guys are working on Avro support
> for
> > stream SQL, I thought I'd ask you for help.
> >
> > If I have a two records of type A and B as below and want to join them
> > similar to "SELECT *" in SQL to produce a record of type AB, is there an
> > simple way to do this with Avro without writing code to copy each field
> > individually?
> >
> > I appreciate any help.
> >
> > Thanks,
> >
> > Roger
> >
> > {
> >   "name": "A",
> >   "type": "record",
> >   "namespace": "fubar",
> >   "fields": [{"name": "a", "type" : "int"}]
> > }
> >
> > {
> >   "name": "B",
> >   "type": "record",
> >   "namespace": "fubar",
> >   "fields": [{"name": "b", "type" : "int"}]
> > }
> >
> > {
> >   "name": "AB",
> >   "type": "record",
> >   "namespace": "fubar",
> >   "fields": [{"name": "a", "type" : "int"}, {"name": "b", "type" :
> "int"}]
> > }
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message