kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Hamilton <dhamil...@nanigans.com>
Subject Re: Kafka connector throughput reduction upon avro schema change
Date Thu, 06 Jul 2017 17:04:52 GMT
Bumping this. Has anyone here observed this in their Kafka connect deployments?


On 5/26/17, 1:44 PM, "Dave Hamilton" <dhamilton@nanigans.com> wrote:

    We are currently using the Kafka S3 connector to ship Avro data to S3. We made a change
to one of our Avro schemas and have noticed consumer throughput on the Kafka connector drop
considerably. I am wondering if there is anything we can do to avoid such issues when we update
schemas in the future?
    This is what I believe is happening:
    ·         The avro producer application is running on 12 instances. They are restarted
in a rolling fashion, switching from producing schema version 1 before the restart to schema
version 2 afterward.
    ·         While the rolling restart is occurring, data on schema version 1 and schema
version 2 is simultaneously being written to the topic.
    ·         The Kafka connector has to close the current avro file for a partition and
ship it whenever it detects a schema change, which is happening several times due to the rolling
nature of the schema update deployment and the mixture of message versions being written during
this time. This process causes the overall consumer throughput to plummet.
    Am I reasoning correctly about what we’re observing here? Is there any way to avoid
this when we change schemas (short of stopping all instances of the service and bringing them
up together on the new schema version)?

View raw message