kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <o...@wikimedia.org>
Subject JSONSchema Kafka Connect Converter
Date Tue, 23 Jan 2018 19:17:52 GMT
Hi all,

I’ve been thinking a lot recently about JSON and Kafka.  Because JSON is
not strongly typed, it isn’t treated as a first class citizen of the Kafka
ecosystem.  At Wikimedia, we use JSONSchema validated JSON
<https://blog.wikimedia.org/2017/01/13/json-hadoop-kafka/> for Kafka
messages.  This makes it so easy for our many disparate teams and services
to consume data from Kafka, without having to consult a remote schema
registry to read data.  (Yes we have to worry about schema evolution, but
we do this on the producer side by requiring that the only schema change
allowed is adding optional fields.)

There’s been discussion
<https://github.com/confluentinc/schema-registry/issues/220> about
JSONSchema support in Confluent’s Schema registry, or perhaps even support
to produce validated Avro JSON (not binary) from Kafka REST proxy.

However, the more I think about this, I realize that I don’t really care
about JSON support in Confluent products.  What I (and I betcha most of the
folks who commented on the issue
<https://github.com/confluentinc/schema-registry/issues/220>) really want
is the ability to use Kafka Connect with JSON data.  Kafka Connect does
sort of support this, but only if your JSON messages conform to its very
specific envelope schema format

What if…Kafka Connect provided a JSONSchemaConverter (*not* Connect’s
JsonConverter), that knew how to convert between a provided JSONSchema and
Kafka Connect internal Schemas?  Would this enable what I think it would?
Would this allow for configuration of Connectors with JSONSchemas to read
JSON messages directly from a Kafka topic?  Once read and converted to a
ConnectRecord, the messages could be used with any Connector out there,

I might have space in the next year to work on something like this, but I
thought I’d ask here first to see what others thought.  Would this be
useful?  If so, is this something that might be upstreamed into Apache

- Andrew Otto
  Senior Systems Engineer
  Wikimedia Foundation

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message