kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrian Jardan <andrianjar...@gmail.com>
Subject Re: Kafka Streams & Distributed state question
Date Thu, 14 Feb 2019 09:01:55 GMT
This is exactly what I was looking for, many thanks for the suggestion !!

If there are any other solutions, would be happy to hear though. 

Retrying and querying other nodes sounds like an approach that would work, but a much better
option would 
be to re-use the hashing algo kafka itself uses to partition the data and send queries to
nodes that most probably have the data.

—
Andrian Jardan
Infrastructure and DevOps expert
cell: +49 174 2815994
Skype: macrosdnb

> On Feb 13, 2019, at 18:56, Bill Bejeck <bill@confluent.io> wrote:
> 
> Hi Andrian,
> 
> There is an existing framework for Interactive Queries contributed by
> Lightbend - https://github.com/lightbend/kafka-streams-query
> 
> HTH,
> Bill
> 
> 
> 
> On Wed, Feb 13, 2019 at 10:14 AM Ryanne Dolan <ryannedolan@gmail.com> wrote:
> 
>> Andrian, this looks useful:
>> 
>> 
>> https://docs.confluent.io/current/streams/developer-guide/interactive-queries.html
>> 
>> The gist is you'd need to build your own routing and discovery layers.
>> 
>> Also consider materializing your data to an external data store, e.g.
>> Couchbase or Hive, which provides this functionality already.
>> 
>> Ryanne
>> 
>> On Wed, Feb 13, 2019, 5:16 AM Andrian Jardan <andrianjardan@gmail.com
>> wrote:
>> 
>>> Hello dear Kafka community,
>>> 
>>> We are planning to build a Kafka Streams application that will build a
>>> pretty big state (~100Gb) in real time from various sources.
>>> 
>>> We later on need to query this state as fast as possible, and the main
>>> idea is to use the built-in in-memory RockDB and get the data via,
>>> some sort of REST API we will build on top. The question is how do we
>>> identify where the data we need is, since the state will obviously
>>> not fit in the memory of a single instance and we need to scale somehow
>> in
>>> case this state keeps growing, and it will…
>>> 
>>> Is there a way to identify where the data we care about resides, on what
>>> kafka streams app instance ?
>>> 
>>> I tried to find the answer in the documentation, but was not able to
>>> figure it our unfortunately.
>>> 
>>> Thank you in advance !
>>> 
>>> —
>>> Andrian Jardan
>>> Infrastructure and DevOps expert
>>> cell: +49 174 2815994
>>> Skype: macrosdnb
>>> 
>>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message