kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Kafka Streams Session store performance degradation from 0.10.2.1 to 0.11.0.3
Date Sat, 12 Jan 2019 01:31:44 GMT
Hello Jonathan,

I've left a comment on https://issues.apache.org/jira/browse/KAFKA-7652
with a fix trying to resolve the discovered bug in trunk. If it verifies to
be the right fix I will push it to older branches as well.

Just FYI.


Guozhang

On Fri, Nov 16, 2018 at 4:26 PM Guozhang Wang <wangguoz@gmail.com> wrote:

> Hi Jonathan,
>
> Could you create a JIRA with all the current available information
> uploaded on the ticket for me to further investigate the issue? This way we
> will not lose track of it (email list is not the best venue for potential
> bug investigation :).
>
> At the mean time, I will try to compare the source code of 0.10.2 and 2.0
> and see if I can eyeball any obvious issues.
>
> Guozhang
>
> On Thu, Nov 8, 2018 at 1:39 PM Matthias J. Sax <matthias@confluent.io>
> wrote:
>
>> Thanks for verifying.
>>
>> >> From our perspective, it appears something happened after 0.10.2.1
>> that made the LRU Cache much slower for our use case.
>>
>> That is what I try to figure out. I went over the 0.10.2.2 to 0.11.0.3
>> Jiras but found nothing I could point out. There are couple of
>> SessionStore related tickets, but none of them should have an effect
>> like this.
>>
>> To narrow it down, it would be helpful to test with other versions, too.
>> Maybe 0.10.2.2 and 0.11.0.0 to see when the issue was introduced.
>>
>> Can you also profile v0.10.2.1 so we can compare?
>>
>> > What would you recommend for our next steps?
>>
>> Not sure. If you could help us to track down the issue, that would be
>> most helpful so get a fix (and you could run from a SNAPSHOT version to
>> get the fix -- not sure if this would be an option for you).
>>
>>
>> -Matthias
>>
>>
>>
>> On 11/7/18 3:47 PM, jonathangordon@newrelic.com wrote:
>> > Hi Matthias,
>> >
>> > I upgraded to 2.0.0 and we're experiencing the same problem. I've
>> posted a new screengrab of a thread profile:
>> >
>> > https://imgur.com/a/2wncPHw
>> >
>> > From our perspective, it appears something happened after 0.10.2.1 that
>> made the LRU Cache much slower for our use case. What would you recommend
>> for our next steps?
>> >
>> > Jonathan
>> >
>> > On 2018/11/06 19:22:16, "Matthias J. Sax" <matthias@confluent.io>
>> wrote:
>> >> Not sure atm why you see a performance degradation. Would need to dig
>> >> into the details.
>> >>
>> >> However, did you consider to upgrade to 2.0 instead or 0.11?
>> >>
>> >> Also note that we added a new operator `suppress()` in upcoming 2.1
>> >> release, that allows you to do rate control without caching:
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables
>> >>
>> >> Hope this helps.
>> >>
>> >>
>> >> -Matthias
>> >>
>> >> On 11/6/18 9:49 AM, Jonathan Gordon wrote:
>> >>> I have a Kafka Streams app that I'm trying to upgrade from 0.10.2.1
to
>> >>> 0.11.0.3 but when I do I notice that CPU goes way up and consumption
>> goes
>> >>> down. A thread profile indicates that the most expensive task is
>> during our
>> >>> aggregation, fetching from the cache.
>> >>>
>> >>> Thread profile with caching:
>> >>> https://imgur.com/l5VEsC2
>> >>>
>> >>> If I disable the cache both performance and consumption are good but
>> we are
>> >>> producing every single aggregation modification, which is not what we
>> want.
>> >>>
>> >>> Thread profile without caching:
>> >>> https://imgur.com/a/JK3nkou
>> >>>
>> >>> I read this thread, which seems relevant e
>> >>>
>> >>>
>> https://lists.apache.org/thread.html/2b44e74eaec7172b107bcff96861cf8b4837f55a44714f69d033cc2e@%3Cusers.kafka.apache.org%3E
>> >>>
>> >>> Notably: "Note, that caching was _not_ introduced to reduce the
>> writes to
>> >>> RocksDB, but to reduce the write the the changelog topic and to
>> reduce the
>> >>> number of records send downstream."
>> >>>
>> >>> So how can we reduce the number of records sent downstream while
>> >>> maintaining the same performance characteristics that we have with
>> caching
>> >>> turned off? Or put another way, how can I upgrade my app without
>> taking a
>> >>> hit in performance or behavior?
>> >>>
>> >>> Thanks!
>> >>>
>> >>
>> >>
>>
>>
>
> --
> -- Guozhang
>


-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message