kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shikhar Bhushan <shik...@confluent.io>
Subject Re: Kafka ETL for Parquet
Date Tue, 02 Aug 2016 22:41:36 GMT
Hi Kidong,

What specific issues did you run into when trying this out?

I think the basic idea would be to depend on the avro-serializer
package and proceed
with implementing your custom Converter similarly to AvroConverter
interface. You only need the deserialization bits (`toConnectData`), and
can stub out `fromConnectData`, since the HDFS connector being a 'sink
connector' will not exercise the latter. The avro-serializer package does
pull in a dependency on kafka-schema-registry-client since it's using the
`SchemaRegistryClient` interface. You can supply your own implementation
here, not all methods are needed for the deserialization bits so it need
not be complete.

Best,

Shikhar

On Mon, Aug 1, 2016 at 5:44 PM Kidong Lee <mykidong@gmail.com> wrote:

> Thanks for your interest Shikhar,
>
> Actually, I have questioned and discussed in the thread:
>
> https://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3CCAE1jLMOnYb2ScNweoBdsXRHOxjYLe=Ha-6igLDNTL95aBUyXBg@mail.gmail.com%3E
> The problem was for me that it was not easy to understand the connect
> internal data structure, and I have tried written AvroConverter as you
> mentioned, but I could not reach to run correctly.
> I could not find how to avoid SR for writing AvroConverter.
>
> Could you give me some concrete implementation of AvroConverter to support
> for instance, Classpath Avro Schema Registry?
>
> - Kidong.
>
>
>
>
>
> 2016-08-02 7:40 GMT+09:00 Shikhar Bhushan <shikhar@confluent.io>:
>
> > Er, mislinked HDFS connector :)
> > https://github.com/confluentinc/kafka-connect-hdfs
> >
> >
> > On Mon, Aug 1, 2016 at 3:39 PM Shikhar Bhushan <shikhar@confluent.io>
> > wrote:
> >
> > > Hi Kidong,
> > >
> > > That's pretty cool! I'm curious what this offers over the Confluent
> HDFS
> > > connector <https://github.com/mykidong/kafka-etl-consumer>, though.
> > >
> > > The README mentions not depending on the Schema Registry, and that the
> > > schema can be retrieved via the classpath and Consul. This
> functionality
> > > should actually be pluggable with Connect by implementing a custom
> > > `Converter`, e.g. the SR comes with AvroConverter which acts as the
> glue.
> > > Converter classes can be specified with the `key.converter` and
> > > `value.converter` configs.
> > >
> > > Best,
> > >
> > > Shikhar
> > >
> > > On Mon, Aug 1, 2016 at 1:56 AM Kidong Lee <mykidong@gmail.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> I have written a simple Kafka ETL which consumes avro encoded data
> from
> > >> Kafka and save them to Parquet on HDFS:
> > >> https://github.com/mykidong/kafka-etl-consumer
> > >>
> > >> It is implemented with Kafka Consumer API and Parquet Writer API.
> > >>
> > >> - Kidong Lee.
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message