I narrowed this problem down to this part of the topology (and yes, it's 100% repro - for me):

KStream<String,SumRecord> transactionKStream =  kStreamBuilder.stream(stringSerde,transactionSerde,TOPIC);

KTable<Windowed<String>, SumRecordCollector> ktAgg = transactionKStream.groupByKey().aggregate(
        SumRecordCollector::new,
        new Aggregate(),
        TimeWindows.of(20 * 60 * 1000L),
        collectorSerde, "table_stream");

Given that this is a pretty trivial, well-traveled piece of Kafka I can't imagine it has a memory leak. 

So Im guessing that the serde I'm using is causing a problem somehow. The 'transactionSerde' is just to get/set JSON into the 'SumRecord' object. That Object is just a bunch of String and int fields so nothing interesting there either.

I'm attaching the two parts of the transactionSerde to see if anyone has suggestions on how to find / fix this.



On Thu, Dec 22, 2016 at 9:26 AM, Jon Yeargers <jon.yeargers@cedexis.com> wrote:
Yes - that's the one. It's 100% reproducible (for me).


On Thu, Dec 22, 2016 at 8:03 AM, Damian Guy <damian.guy@gmail.com> wrote:
Hi Jon,

Is this for the topology where you are doing something like:

topology: kStream -> groupByKey.aggregate(minute) -> foreach
                             \-> groupByKey.aggregate(hour) -> foreach

I'm trying to understand how i could reproduce your problem. I've not seen
any such issues with 0.10.1.1, but then i'm not sure what you are doing.

Thanks,
Damian

On Thu, 22 Dec 2016 at 15:26 Jon Yeargers <jon.yeargers@cedexis.com> wrote:

> Im still hitting this leak with the released version of 0.10.1.1.
>
> Process mem % grows over the course of 10-20 minutes and eventually the OS
> kills it.
>
> Messages like this appear in /var/log/messages:
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.793692] java invoked
> oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.798383] java cpuset=/
> mems_allowed=0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.801079] CPU: 0 PID: 9550
> Comm: java Tainted: G            E   4.4.19-29.55.amzn1.x86_64 #1
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Hardware name:
> Xen HVM domU, BIOS 4.2.amazon 11/11/2016
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> 0000000000000000 ffff88071c517a70 ffffffff812c958f ffff88071c517c58
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> 0000000000000000 ffff88071c517b00 ffffffff811ce76d ffffffff8109db14
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> ffffffff810b2d91 0000000000000000 0000000000000010 ffffffff817d0fe9
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072] Call Trace:
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff812c958f>] dump_stack+0x63/0x84
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811ce76d>] dump_header+0x5e/0x1d8
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8109db14>] ? set_next_entity+0xa4/0x710
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff81163ba5>] oom_kill_process+0x205/0x3d0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff81164201>] out_of_memory+0x431/0x480
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811692ce>] __alloc_pages_nodemask+0x91e/0xa60
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811ad0b8>] alloc_pages_current+0x88/0x120
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811604a4>] __page_cache_alloc+0xb4/0xc0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff811627e8>] filemap_fault+0x188/0x3e0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffffa0122cb6>] ext4_filemap_fault+0x36/0x50 [ext4]
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8118a24d>] __do_fault+0x3d/0x70
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8118e687>] handle_mm_fault+0xf27/0x1870
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff810b2d91>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8105ea33>] __do_page_fault+0x183/0x3f0
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff8105ecc2>] do_page_fault+0x22/0x30
>
> Dec 22 13:31:22 ip-172-16-101-108 kernel: [2989844.805072]
> [<ffffffff814e03d8>] page_fault+0x28/0x30
>