kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Noll <mich...@confluent.io>
Subject Re: using a state store for deduplication
Date Mon, 27 Mar 2017 14:41:44 GMT
Jon,

Damian already answered your direct question, so my comment is a FYI:

There's a demo example at
https://github.com/confluentinc/examples/blob/3.2.x/kafka-streams/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java
(this is for Confluent 3.2 / Kafka 0.10.2.0).

Note that this code is for demonstration purposes.  To make the example
more suitable to production use cases you could e.g. switch to a window
store instead of manually purging expired entries via
`ReadOnlyKeyValueStore#all()` (which might be an expensive
operation/iteration).

Hope this helps,
Michael




On Mon, Mar 27, 2017 at 3:07 PM, Damian Guy <damian.guy@gmail.com> wrote:

> Jon,
> You don't need all the data for every topic as the data is partitioned by
> key. Therefore each state-store instance is de-duplicating a subset of the
> key set.
> Thanks,
> Damian
>
> On Mon, 27 Mar 2017 at 13:47 Jon Yeargers <jon.yeargers@cedexis.com>
> wrote:
>
> > Ive been (re)reading this document(
> > http://docs.confluent.io/3.2.0/streams/developer-guide.html#state-stores
> )
> > hoping to better understand StateStores. At the top of the section there
> is
> > a tantalizing note implying that one could do deduplication using a
> store.
> >
> > At present we using Redis for this as it gives us a shared location. Ive
> > been of the mind that a given store was local to a streams instance. To
> > truly support deduplication I would think one would need access to _all_
> > the data for a topic and not just on a per-partition basis.
> >
> > Am I completely misunderstanding this?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message