kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thameem Ansari <thame...@gmail.com>
Subject Re: Kafka streams (2.1.1) - org.rocksdb.RocksDBException:Too many open files
Date Thu, 04 Jul 2019 03:58:56 GMT
Sorry for reposting the previous message as the images didn’t come thru.. pasting as text.


I have changed both system and user limits. 

To completely isolate the problem, I have tried the application in Centos 7 environment. Set
the ulimit to 1million and system limits to 10million open files. Now 3 kafka nodes are running
in separate servers each and streaming application is running in a dedicated VM. Now the application
is not explicitly throwing “too many open files” error but automatically aborted with
the message 

terminate called after throwing an instance of 'std::system_error'
  what():  Resource temporarily unavailable
./bin/start.sh: line 42:  1100 Aborted 

Here is the last few lines from strace output which shows the aborted message. 

25721 14:58:35 open("/home/devops/data/kafka-streams/RawLog_Processor/393_7/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN.1555200000/MANIFEST-000007",
O_RDONLY|O_CLOEXEC) = 12505
25721 14:58:35 open("/sys/devices/virtual/block/dm-2/queue/logical_block_size", O_RDONLY)
= 12506
25721 14:58:35 read(12506, "512\n", 4096) = 4
25721 14:58:35 close(12506)             = 0
25721 14:58:35 write(12502, "s.advise_random_on_open: 0\n2019/"..., 4096) = 4096
25721 14:58:35 write(12502, "ions.comparator: leveldb.Bytewis"..., 4096) = 4096
25721 14:58:35 read(12505, "V\371\270\370\34\0\1\1\32leveldb.BytewiseCompara"..., 32768) =
192
25721 14:58:35 read(12505, "", 28672)   = 0
25721 14:58:35 close(12505)             = 0
17701 14:58:35 open("/home/devops/data/kafka-streams/RawLog_Processor/393_7/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN/MAIL.INCOMING_FILTER_BY_USER_MAIL_INCOMING_DOMAIN_TRAFFIC_DETAIL.DETECTED_CONTENT_FILTER_RECIPIENT_DOMAIN.1555200000/000006.sst",
O_RDONLY|O_CLOEXEC) = 12505
17702 14:58:35 +++ exited with 0 +++
25721 14:58:35 write(2, "terminate called after throwing "..., 48) = 48
25721 14:58:35 write(2, "std::system_error", 17) = 17
25721 14:58:35 write(2, "'\n", 2)       = 2
25721 14:58:35 write(2, "  what():  ", 11) = 11
25721 14:58:35 write(2, "Resource temporarily unavailable", 32) = 32
25721 14:58:35 write(2, "\n", 1)        = 1
17701 14:58:35 open("/sys/devices/virtual/block/dm-2/queue/logical_block_size", O_RDONLY)
= 12506
25721 14:58:35 write(12502, "ction_dynamic_level_bytes: 0\n201"..., 3672) = 3672
25721 14:58:35 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=25697, si_uid=1000}
---
17701 14:58:35 read(12506,  <unfinished ...>
17701 14:58:36 +++ killed by SIGABRT +++
17700 14:58:36 +++ killed by SIGABRT +++
17699 14:58:36 +++ killed by SIGABRT +++


As I can see from the open files they are way lower (45201) than the max limit and hence can
we rule out the “open files” is the root cause? 

I have also noticed there is lot of EAGAIN (count 3334) and associated “Resource temporarily
unavailable” messages as well



25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25732 14:49:23 close(59)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25721 14:49:23 write(50, "\0", 1)       = 1
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25721 14:49:23 read(17, "PK\3\4\n\0\0\10\10\0\343SHNM\n4>Q\2\0\0\265\5\0\0A\0\0\0", 30)
= 30
25721 14:49:23 read(17, "\245T]O\23A\24=\323\226\326-\253\255\250\365\v\4\265ji\221E\344\305\200M\f\201\304\244j"...,
593) = 593
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25732 14:49:23 close(59)                = 0
25721 14:49:23 read(49, "\0", 16)       = 1
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25732 14:49:23 close(59)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25721 14:49:23 read(54, "\0\1dx", 4)    = 4
25721 14:49:23 read(54, "\0\0\0\v\0\0\0\0\0\0\0\1dj\0\0\0\0\1\212\0\201RawLog_Pro"..., 91256)
= 91256
25721 14:49:23 read(54, 0x7f7044003b80, 4) = -1 EAGAIN (Resource temporarily unavailable)
25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25721 14:49:23 write(35, "2019-07-03 14:49:23.258 [rawLP1-"..., 219 <unfinished ...>
25732 14:49:23 close(59)                = 0
25721 14:49:23 <... write resumed> )    = 219
25721 14:49:23 write(1, "2019-07-03 14:49:23.258 [rawLP1-"..., 219) = 219
25721 14:49:23 write(50, "\0", 1)       = 1
25721 14:49:23 read(49, "\0", 16)       = 1
25708 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 60
25708 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25708 14:49:23 close(59)                = 0
25709 14:49:23 read(60, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(60)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25708 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25708 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25708 14:49:23 close(59)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25709 14:49:23 close(59)                = 0
25732 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25732 14:49:23 read(59, "9223372036854775807\n", 4096) = 20
25732 14:49:23 close(59)                = 0
25709 14:49:23 open("/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = 59
25709 14:49:23 read(59, "9223372036854775807\n", 4096) = 20

But unable to come to any conclusion from these findings. Any insight is appreciated. 

Thanks
Thameem

> On Jul 4, 2019, at 9:18 AM, Thameem Ansari <thameema@gmail.com> wrote:
> 
> <strace2.png>
> 
> 
> But unable to come to any conclusion from these findings. Any insight is appreciated.

> 
> Thanks
> Thameem


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message