kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Memory / resource leak in 0.10.1.1 release
Date Fri, 30 Dec 2016 05:42:00 GMT
Hello Jon,

It is hard to tell, since I cannot see how is your Aggregate() function is
implemented as well.

Note that the deserializer of transactionSerde is used in both `aggregate`
and `KstreamBuilder.stream`, while the serializer of transactionSerde is
only used in `aggregate`, so if you suspect the transactionSerde is the
root cause, to narrow it down you can leave the topology as


KStream<String,SumRecord> transactionKStream =  kStreamBuilder.stream(
stringSerde,transactionSerde,TOPIC);

transactionKStream.to(TOPIC-2);

where TOPIC-2 should be pre-created.

The above topology will also trigger both the serializer and deserializer
of the transactionSerde, and if this topology also leads to memory leak,
then it means it is not relevant to your aggregate function.


Guozhang


On Sun, Dec 25, 2016 at 4:15 AM, Jon Yeargers <jon.yeargers@cedexis.com>
wrote:

> I narrowed this problem down to this part of the topology (and yes, it's
> 100% repro - for me):
>
> KStream<String,SumRecord> transactionKStream =
>  kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC);
>
> KTable<Windowed<String>, SumRecordCollector> ktAgg =
> transactionKStream.groupByKey().aggregate(
>         SumRecordCollector::new,
>         new Aggregate(),
>         TimeWindows.of(20 * 60 * 1000L),
>         collectorSerde, "table_stream");
>
> Given that this is a pretty trivial, well-traveled piece of Kafka I can't
> imagine it has a memory leak.
>
> So Im guessing that the serde I'm using is causing a problem somehow. The
> 'transactionSerde' is just to get/set JSON into the 'SumRecord' object.
> That Object is just a bunch of String and int fields so nothing interesting
> there either.
>
> I'm attaching the two parts of the transactionSerde to see if anyone has
> suggestions on how to find / fix this.
>
>
>
> On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers <jon.yeargers@cedexis.com>
> wrote:
>
>> Yes - that's the one. It's 100% reproducible (for me).
>>
>>
>> On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy <damian.guy@gmail.com> wrote:
>>
>>> Hi Jon,
>>>
>>> Is this for the topology where you are doing something like:
>>>
>>> topology: kStream -> groupByKey.aggregate(minute) -> foreach
>>>                              \-> groupByKey.aggregate(hour) -> foreach
>>>
>>> I'm trying to understand how i could reproduce your problem. I've not
>>> seen
>>> any such issues with 0.10.1.1, but then i'm not sure what you are doing.
>>>
>>> Thanks,
>>> Damian
>>>
>>> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers <jon.yeargers@cedexis.com>
>>> wrote:
>>>
>>> > Im still hitting this leak with the released version of 0.10.1.1.
>>> >
>>> > Process mem % grows over the course of 10-20 minutes and eventually
>>> the OS
>>> > kills it.
>>> >
>>> > Messages like this appear in /var/log/messages:
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
>>> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java
>>> cpuset=/
>>> > mems_allowed=0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID:
>>> 9550
>>> > Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware
>>> name:
>>> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > 0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > 0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff812c958f>] dump_stack+0x63/0x84
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811ce76d>] dump_header+0x5e/0x1d8
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8109db14>] ? set_next_entity+0xa4/0x710
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
>>> spin_unlock+0x11/0x20
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff81164201>] out_of_memory+0x431/0x480
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff811627e8>] filemap_fault+0x188/0x3e0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8118a24d>] __do_fault+0x3d/0x70
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
>>> spin_unlock+0x11/0x20
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff8105ecc2>] do_page_fault+0x22/0x30
>>> >
>>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>>> > [<ffffffff814e03d8>] page_fault+0x28/0x30
>>> >
>>>
>>
>>
>


-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message