kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: aggregation tables mirrored in kafka & rocksdb
Date Wed, 28 Feb 2018 21:31:56 GMT
Hello Nicu,

For your aggregation application, is it windowed or non windowed? If it is
windowed aggregation then you can specify your window specs so that the
underlying RocksDB state store would only keep the most recent windows,
while your Cassandra keeps the full history of all past windows.

You can, of course, implement your own state store that directly talk to
Cassandra (the StateStore interfaces allows users to customize their own
storage mechanism, either local or remote), but to optimize latency you may
want to have some local in-memory caches with write-back to batch access to
your Cassandra cluster.


On Wed, Feb 28, 2018 at 5:45 AM, Marasoiu, Nicu <
nicu.marasoiu@metrosystems.net> wrote:

> Hi,
> Currently we have an aggregation system (without kafka) where events are
> aggregated into Cassandra tables holding aggregate results.
> We are considering moving to a KafkaStreams solution with exactly-once
> processing but in this case it seems that all the aggregation tables
> (reaching TB) need to be kept also in Kafka as ktables(Rocksdb)+compacted
> topics(Kafka) and the direction of computation would be: events topic -> KS
> aggregation -> aggregated topics -> one way sync to Cassandra using
> connector.
> This poses two problems:
> - the doubling on the total storage required for the system (which mainly
> stores aggregates), from 3 C* replicas to 2-3 K replicas + Rocksdb
> - the time to reconstruct a Rocksdb instance during rollover update can be
> half an hour if the rollover is fast
> Is there any way in Kafka Streams (even dropping the exactly once) in
> which we can work just with aggregate tables in Cassandra? For sure there
> is a way working with Kafka Consumer, pulling a batch of messages,
> aggregating, and adding aggregate to Cassandra. Not sure if possible with
> KafkaStreams given the higher level / FP modeling with its own clear
> advantages but this disadvantage.
> Please advise,
> Nicu Marasoiu
> Geschäftsanschrift/Business address: METRO SYSTEMS GmbH, Metro-Straße 12,
> 40235 Düsseldorf, Germany
> Aufsichtsrat/Supervisory Board: Heiko Hutmacher (Vorsitzender/ Chairman)
> Geschäftsführung/Management Board: Dr. Dirk Toepfer (Vorsitzender/CEO),
> Wim van Herwijnen
> Sitz Düsseldorf, Amtsgericht Düsseldorf, HRB 18232/Registered Office
> Düsseldorf, Commercial Register of the Düsseldorf Local Court, HRB 18232
> Betreffend Mails von *@metrosystems.net
> Die in dieser E-Mail enthaltenen Nachrichten und Anhänge sind
> ausschließlich für den bezeichneten Adressaten bestimmt. Sie können
> rechtlich geschützte, vertrauliche Informationen enthalten. Falls Sie nicht
> der bezeichnete Empfänger oder zum Empfang dieser E-Mail nicht berechtigt
> sind, ist die Verwendung, Vervielfältigung oder Weitergabe der Nachrichten
> und Anhänge untersagt. Falls Sie diese E-Mail irrtümlich erhalten haben,
> informieren Sie bitte unverzüglich den Absender und vernichten Sie die
> E-Mail.
> Regarding mails from *@metrosystems.net
> This e-mail message and any attachment are intended exclusively for the
> named addressee. They may contain confidential information which may also
> be protected by professional secrecy. Unless you are the named addressee
> (or authorised to receive for the addressee) you may not copy or use this
> message or any attachment or disclose the contents to anyone else. If this
> e-mail was sent to you by mistake please notify the sender immediately and
> delete this e-mail.

-- Guozhang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message