kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Szumowski <tszumow...@gmail.com>
Subject Re: Bemchmarks for KTable Joins and Queries
Date Sun, 04 Nov 2018 21:06:10 GMT
Wow thank you for the great details. It is very helpful.

We may shoot to try out some benchmarks in the next few weeks. I'll report
back if we find anything notable.

Thanks again!

On Nov 4, 2018 12:53, "Matthias J. Sax" <matthias@confluent.io> wrote:

Thanks for the details.

Don't think there is a answer with numbers. You would need to benchmark
this yourself. However, some details:

 - If you query a local store using a key-lookup (ie, point query) is
should be sub-millisecond latency for a high percentile. The read is
either served from the Kafka Streams store cache, or RocksDBs memcache,
or worst case disk. Thus, you can tune it with increases caches and it
depends on your storage hardware (eg, HDD vs SSD). Because of those
factors, it's hard to give a better answer.

 - If you do local range-quries, those are slower (I assume 10x
difference). But again, you would need to benchmark to get more details
as it depends on your hardware and configuration.

 - For querying remote stores: Kafka Streams only provides the basic
building blocks, and it's your application that implement the actual
routing and network communication. Thus, it depends mainly on your own
code. All metadata about all stores is exchanged in every rebalance, and
thus, for each instance it's either a local lookup, or one network hop
to the hosting instance of a key (it's always one hop for 5, 10, or 100)
instances (but it's your own code the implements this) for point-queries.

 - For general range queries, you might need to query all instances and
assemble the result, because due to hash-partitioning, a key-range can
be distributed over all instances.

 - For GlobalKTables/GlobalStores you always have a full copy of the
state (it's not sharded as in regular KTable/stores) and thus, you
always do a local point/range query. Not need to query any remote store.

Hope this helps.


On 11/3/18 4:53 PM, Tom Szumowski wrote:
> Hi Matthias.
> My apologies for my ambiguous descriptions. I responded in-line to your
> questions below. Thank you for taking the time to understand my question.
> On Fri, Nov 2, 2018 at 1:28 PM Matthias J. Sax <matthias@confluent.io>
> wrote:
>>>> At a high level, KTables provide a capability to query for data.
>> Can you elaborate? Do you refer to "Interactive Queries" feature?
> *Yes that's correct. I'm referring to Interactive queries as described
> <
> This description
> <

> covers
> several use cases for interactive queries including: real-time threat
> monitoring, video gaming, risk and fraud, and trend detection. Let's
> consider the video gaming use case for example. It states, "A mobile
> companion app can then directly query the Kafka Streams application to
> the current location of a player to friends and family, and invite them to
> come along". In that scenario, one may be very interested in knowing the
> throughput and latency (independent of network delays) for executing that
> Interactive Query on the mobile companion app in order to get locations.*

>>>> And I imagine latency/throughout of KTable queries depend on the number
>> of
>>>> consumers the query would have to touch to complete.
>> * Similar here. I am not sure if I can follow.*
> In that same article
> <

> above,
> it refers to local state stores and remote state stores. I'm interested in
> understanding latencies when querying a remote state store
> <

> The diagram in that link shows three instances. What if there are 100
> instances? How do the latencies in a query change as the number of
> instances increase (if at all)? What kind of latencies in that
> configuration would we expect? And how would it compare to an alternate
> configuration that utilizes a GlobalKTable
> <https://docs.confluent.io/current/streams/concepts.html#globalktable>?
> *I hope that helps clarify.*
> *-Tom*

>> -Matthias
>> On 11/2/18 4:27 AM, Tom Szumowski wrote:
>>> Thank you for the clarification. I understand they are fundamentally
>>> different underneath than a relational database, and may not be fair to
>>> compare directly.
>>> But how about benchmarks that aren't a comparison from other databases?
>>> At a high level, KTables provide a capability to query for data. Suppose
>> I
>>> have a requirement to fetch thr data in less than X milliseconds. It
>> seems
>>> fair to want to understand if a KTable can satisfy that capability under
>>> some configuration, or if I need to seek alternate solutions.
>>> And I imagine latency/throughout of KTable queries depend on the number
>> of
>>> consumers the query would have to touch to complete. For example, in
>>> complex joins or filters. Or perhaps a trade-off with GlobalKTables...
>>> That's the kind of information in terms of benchmarks I'd be interested
>> in
>>> knowing exists or not.
>>> Thank you,
>>> Tom
>>> On Thu, Nov 1, 2018, 16:24 Matthias J. Sax <matthias@confluent.io wrote:
>>>> I am not aware if benchmarks, but want to point out, that KTables work
>>>> somewhat different to relational database system. Thus, you might want
>>>> to evaluate not base on performance, but on the semantics KTable
>> provide.
>>>> Recall, that Kafka Streams is a stream processing library while a
>>>> database system is a "batch processing" system. It's two quite
>>>> types of systems and benchmarking them to compare each other is
>>>> questionable.
>>>> -Matthias
>>>> On 10/31/18 12:18 PM, Tom Szumowski wrote:
>>>>> I was wonderinf if anyone had or knew of benchmark tests for KTable or
>>>>> GlobalKTable queries/joins, as compared to alternatives such as
>>>> distributed
>>>>> databases.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message