kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Yeargers <jon.yearg...@cedexis.com>
Subject Re: Memory / resource leak in 0.10.1.1 release
Date Sun, 25 Dec 2016 12:15:25 GMT
I narrowed this problem down to this part of the topology (and yes, it's
100% repro - for me):

KStream<String,SumRecord> transactionKStream =
 kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC);

KTable<Windowed<String>, SumRecordCollector> ktAgg =
transactionKStream.groupByKey().aggregate(
        SumRecordCollector::new,
        new Aggregate(),
        TimeWindows.of(20 * 60 * 1000L),
        collectorSerde, "table_stream");

Given that this is a pretty trivial, well-traveled piece of Kafka I can't
imagine it has a memory leak.

So Im guessing that the serde I'm using is causing a problem somehow. The
'transactionSerde' is just to get/set JSON into the 'SumRecord' object.
That Object is just a bunch of String and int fields so nothing interesting
there either.

I'm attaching the two parts of the transactionSerde to see if anyone has
suggestions on how to find / fix this.



On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers <jon.yeargers@cedexis.com>
wrote:

> Yes - that's the one. It's 100% reproducible (for me).
>
>
> On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy <damian.guy@gmail.com> wrote:
>
>> Hi Jon,
>>
>> Is this for the topology where you are doing something like:
>>
>> topology: kStream -> groupByKey.aggregate(minute) -> foreach
>>                              \-> groupByKey.aggregate(hour) -> foreach
>>
>> I'm trying to understand how i could reproduce your problem. I've not seen
>> any such issues with 0.10.1.1, but then i'm not sure what you are doing.
>>
>> Thanks,
>> Damian
>>
>> On Thu, 22 Dec 2016 at 15:26 Jon Yeargers <jon.yeargers@cedexis.com>
>> wrote:
>>
>> > Im still hitting this leak with the released version of 0.10.1.1.
>> >
>> > Process mem % grows over the course of 10-20 minutes and eventually the
>> OS
>> > kills it.
>> >
>> > Messages like this appear in /var/log/messages:
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
>> > oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/
>> > mems_allowed=0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID:
>> 9550
>> > Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware
>> name:
>> > Xen HVM domU, BIOS 4.2.amazon 11/11/2016
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > 0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > 0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff812c958f>] dump_stack+0x63/0x84
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811ce76d>] dump_header+0x5e/0x1d8
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8109db14>] ? set_next_entity+0xa4/0x710
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
>> spin_unlock+0x11/0x20
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff81164201>] out_of_memory+0x431/0x480
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff811627e8>] filemap_fault+0x188/0x3e0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8118a24d>] __do_fault+0x3d/0x70
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_
>> spin_unlock+0x11/0x20
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff8105ecc2>] do_page_fault+0x22/0x30
>> >
>> > Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
>> > [<ffffffff814e03d8>] page_fault+0x28/0x30
>> >
>>
>
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message