kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Flax <avi.f...@parkassist.com>
Subject Re: Streams RocksDB State Store Disk Usage
Date Thu, 30 Jun 2016 15:36:04 GMT
On Jun 29, 2016, at 22:44, Guozhang Wang <wangguoz@gmail.com> wrote:
> One way to mentally quantify your state store usage is to consider the
> total key space in your reduceByKey() operator, and multiply by the average
> key-value pair size. Then you need to consider the RocksDB write / space
> amplification factor as well.

That makes sense, thank you!

> Currently Kafka Streams hard-write some RocksDB config values such as block
> size to achieve good write performance with the cost of write
> amplification, but we are now working on exposing those configs to the
> users so that they can override themselves:
> https://issues.apache.org/jira/browse/KAFKA-3740

That looks excellent for the next release ;)

In the meantime, do you know anything specific about the RocksDB behavior with the LOG and
LOG.old.{timestamp} files? (They don’t seem to me to be directly related to the storage
space required by the actual state itself, unless I’m misunderstanding the word “log”
— it is a bit overloaded in this community.) Is there something I can do in code to affect
this? Or some way to understand/predict the growth patterns of these files, whether or not
RocksDB has some kind of built-in cleanup feature or whether I need to set up a cron job on
my own?

View raw message