kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Kafka Streams Optimizations
Date Mon, 14 Dec 2020 22:44:21 GMT
Hello Navneeth,

Please find answers to your questions below.

On Sun, Dec 6, 2020 at 10:33 PM Navneeth Krishnan <reachnavneeth2@gmail.com>

> Hi All,
> I have been working on moving an application to kafka streams and I have
> the following questions.
> 1. We were planning to use an EFS mount to share rocksdb data for KV store
> and global state store with which we were hoping to minimize the state
> restore time when new instances are brought up. But later we found that
> global state stores require a lock so directories cannot be shared. Is
> there some way around this? How is everyone minimizing the state
> restoration time?
The common suggestion is to check if during restoration, the write-buffer
is flushed too frequently so that a lot of IOs are incurred for compaction:
generally speaking you'd want to have few large sst files instead of a lot
of small sst files. And the default config may learn towards the latter
case which would slow down restoration.

We are currently working to make the following changes: use addSSTFiles API
of rocksDB to try to reduce the restoration IO cost; and consider moving
restoration to a different thread so that stream thread would not block on
IO when this happens. Stay tuned for the next release.

> 2. Topology optimization: We are using PAPI and as per the docs topology
> optimization will have no effects on low level api. Is my understanding
> correct?


> 3. There are about 5 KV stores in our stream application and for a few the
> data size is a bit larger. Is there a config to write data to the changelog
> topic only once a minute or something? I know it will be a problem in
> maintaining the data integrity. Basically we want to reduce the amount of
> changelog data written since we will have some updates for each user every
> 5 secs or so. Any suggestions on optimizations.
Currently increasing the total cache may help (configured as
"cache.max.bytes.buffering"), this is because the caching layer is at the
same time used for suppressing updates for the same key, and hence to the
changelogs as well.

> 4. Compress data: Is there an option to compress the data being sent and
> consumed from kafka only for the intermediate topics. The major reason is
> we don't want to change the final sink because it's used by many
> applications. If we can just compress and write the data only for the
> intermediate topics and changelog that would be nice.
I think you can set compression codec at the per-topic basis on Kafka.

> Thanks and appreciate all the help.
> Regards,
> Navneeth

-- Guozhang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message