kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navneeth Krishnan <reachnavnee...@gmail.com>
Subject Re: Kafka Streams Optimizations
Date Tue, 15 Dec 2020 00:08:54 GMT
Thanks Guozhang for the response. Appreciate the help.

Regards,
Navneeth

On Mon, Dec 14, 2020 at 2:44 PM Guozhang Wang <wangguoz@gmail.com> wrote:

> Hello Navneeth,
>
> Please find answers to your questions below.
>
> On Sun, Dec 6, 2020 at 10:33 PM Navneeth Krishnan <
> reachnavneeth2@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I have been working on moving an application to kafka streams and I have
> > the following questions.
> >
> > 1. We were planning to use an EFS mount to share rocksdb data for KV
> store
> > and global state store with which we were hoping to minimize the state
> > restore time when new instances are brought up. But later we found that
> > global state stores require a lock so directories cannot be shared. Is
> > there some way around this? How is everyone minimizing the state
> > restoration time?
> >
> >
> The common suggestion is to check if during restoration, the write-buffer
> is flushed too frequently so that a lot of IOs are incurred for compaction:
> generally speaking you'd want to have few large sst files instead of a lot
> of small sst files. And the default config may learn towards the latter
> case which would slow down restoration.
>
> We are currently working to make the following changes: use addSSTFiles API
> of rocksDB to try to reduce the restoration IO cost; and consider moving
> restoration to a different thread so that stream thread would not block on
> IO when this happens. Stay tuned for the next release.
>
>
> > 2. Topology optimization: We are using PAPI and as per the docs topology
> > optimization will have no effects on low level api. Is my understanding
> > correct?
> >
>
> Correct.
>
> >
> > 3. There are about 5 KV stores in our stream application and for a few
> the
> > data size is a bit larger. Is there a config to write data to the
> changelog
> > topic only once a minute or something? I know it will be a problem in
> > maintaining the data integrity. Basically we want to reduce the amount of
> > changelog data written since we will have some updates for each user
> every
> > 5 secs or so. Any suggestions on optimizations.
> >
> >
> Currently increasing the total cache may help (configured as
> "cache.max.bytes.buffering"), this is because the caching layer is at the
> same time used for suppressing updates for the same key, and hence to the
> changelogs as well.
>
>
> > 4. Compress data: Is there an option to compress the data being sent and
> > consumed from kafka only for the intermediate topics. The major reason is
> > we don't want to change the final sink because it's used by many
> > applications. If we can just compress and write the data only for the
> > intermediate topics and changelog that would be nice.
> >
> >
> I think you can set compression codec at the per-topic basis on Kafka.
>
>
> > Thanks and appreciate all the help.
> >
> > Regards,
> > Navneeth
> >
>
>
> --
> -- Guozhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message