lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe <tomasflo...@gmail.com>
Subject Re: Re: High CPU usage with Solr 7.7.0
Date Wed, 27 Feb 2019 18:34:00 GMT
Maybe a thread dump would be useful if you still have some instance running
on 7.7

On Wed, Feb 27, 2019 at 7:28 AM Lukas Weiss <Lukas.Weiss@raiffeisen.it>
wrote:

> I can confirm this. Downgrading to 7.6.0 solved the issue.
> Thanks for the hint.
>
>
>
> Von:    "Joe Obernberger" <joseph.obernberger@gmail.com>
> An:     solr-user@lucene.apache.org, "Lukas Weiss"
> <Lukas.Weiss@raiffeisen.it>,
> Datum:  27.02.2019 15:59
> Betreff:        Re: High CPU usage with Solr 7.7.0
>
>
>
> Just to add to this.  We upgraded to 7.7.0 and saw very large CPU usage
> on multi core boxes - sustained in the 1200% range.  We then switched to
> 7.6.0 (no other configuration changes) and the problem went away.
>
> We have a 40 node cluster and all 40 nodes had high CPU usage with 3
> indexes stored on HDFS.
>
> -Joe
>
> On 2/27/2019 5:04 AM, Lukas Weiss wrote:
> > Hello,
> >
> > we recently updated our Solr server from 6.6.5 to 7.7.0. Since then, we
> > have problems with the server's CPU usage.
> > We have two Solr cores configured, but even if we clear all indexes and
> do
> > not start the index process, we see 100 CPU usage for both cores.
> >
> > Here's what our top says:
> >
> > root@solr:~ # top
> > top - 09:25:24 up 17:40,  1 user,  load average: 2,28, 2,56, 2,68
> > Threads:  74 total,   3 running,  71 sleeping,   0 stopped,   0 zombie
> > %Cpu0  :100,0 us,  0,0 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si,
> > 0,0 st
> > %Cpu1  :100,0 us,  0,0 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si,
> > 0,0 st
> > %Cpu2  : 11,3 us,  1,0 sy,  0,0 ni, 86,7 id,  0,7 wa,  0,0 hi,  0,3 si,
> > 0,0 st
> > %Cpu3  :  3,0 us,  3,0 sy,  0,0 ni, 93,7 id,  0,3 wa,  0,0 hi,  0,0 si,
> > 0,0 st
> > KiB Mem :  8388608 total,  7859168 free,   496744 used,    32696
> > buff/cache
> > KiB Swap:  2097152 total,  2097152 free,        0 used.  7859168 avail
> Mem
> >
> >
> >    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+
> COMMAND
> >                P
> > 10209 solr      20   0 6138468 452520  25740 R 99,9  5,4  29:43.45 java
> > -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4
> > -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
> > -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 24
> > 10214 solr      20   0 6138468 452520  25740 R 99,9  5,4  28:42.91 java
> > -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4
> > -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
> > -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 25
> >
> > The solr server is installed on a Debian Stretch 9.8 (64bit) on Linux
> LXC
> > dedicated Container.
> >
> > Some more server info:
> >
> > root@solr:~ # java -version
> > openjdk version "1.8.0_181"
> > OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-2~deb9u1-b13)
> > OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
> >
> > root@solr:~ # free -m
> >                total        used        free      shared  buff/cache
> > available
> > Mem:           8192         484        7675         701          31 7675
> > Swap:          2048           0        2048
> >
> > We also found something strange if we do an strace of the main process,
> we
> > get lots of ongoing connection timeouts:
> >
> > root@solr:~ # strace -F -p 4136
> > strace: Process 4136 attached with 48 threads
> > strace: [ Process PID=11089 runs in x32 mode. ]
> > [pid  4937] epoll_wait(139,  <unfinished ...>
> > [pid  4936] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4909] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4618] epoll_wait(136,  <unfinished ...>
> > [pid  4576] futex(0x7ff61ce66474, FUTEX_WAIT_PRIVATE, 1, NULL
> <unfinished
> > ...>
> > [pid  4279] futex(0x7ff61ce62b34, FUTEX_WAIT_PRIVATE, 2203, NULL
> > <unfinished ...>
> > [pid  4244] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4227] futex(0x7ff56c71ae14, FUTEX_WAIT_PRIVATE, 2237, NULL
> > <unfinished ...>
> > [pid  4243] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4228] futex(0x7ff5608331a4, FUTEX_WAIT_PRIVATE, 2237, NULL
> > <unfinished ...>
> > [pid  4208] futex(0x7ff61ce63e54, FUTEX_WAIT_PRIVATE, 5, NULL
> <unfinished
> > ...>
> > [pid  4205] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4204] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4196] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4195] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4194] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4193] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4187] restart_syscall(<... resuming interrupted restart_syscall
> ...>
> > <unfinished ...>
> > [pid  4180] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4179] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4177] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4174] accept(133,  <unfinished ...>
> > [pid  4173] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4172] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4171] restart_syscall(<... resuming interrupted restart_syscall
> ...>
> > <unfinished ...>
> > [pid  4165] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4164] futex(0x7ff61c1f5054, FUTEX_WAIT_PRIVATE, 3, NULL
> <unfinished
> > ...>
> > [pid  4163] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4162] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4161] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4160] futex(0x7ff623d52c20,
> > FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff
> > <unfinished ...>
> > [pid  4159] futex(0x7ff61c1e9d54, FUTEX_WAIT_PRIVATE, 7, NULL
> <unfinished
> > ...>
> > [pid  4158] futex(0x7ff61c1b7f54, FUTEX_WAIT_PRIVATE, 15, NULL
> <unfinished
> > ...>
> > [pid  4157] futex(0x7ff61c1b5554, FUTEX_WAIT_PRIVATE, 19, NULL
> <unfinished
> > ...>
> > [pid  4156] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4155] restart_syscall(<... resuming interrupted futex ...>
> > <unfinished ...>
> > [pid  4153] futex(0x7ff61c06c754, FUTEX_WAIT_PRIVATE, 7, NULL
> <unfinished
> > ...>
> > [pid  4152] futex(0x7ff61c06ab54, FUTEX_WAIT_PRIVATE, 3, NULL
> <unfinished
> > ...>
> > [pid  4151] futex(0x7ff61c068f54, FUTEX_WAIT_PRIVATE, 7, NULL
> <unfinished
> > ...>
> > [pid  4150] futex(0x7ff61c067354, FUTEX_WAIT_PRIVATE, 7, NULL
> <unfinished
> > ...>
> > [pid  4148] futex(0x7ff61c024a54, FUTEX_WAIT_PRIVATE, 403, NULL
> > <unfinished ...>
> > [pid  4165] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection
> > timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564856, tv_nsec=849859736}, 0xffffffff <unfinished ...>
> > [pid  4147] futex(0x7ff61c022e54, FUTEX_WAIT_PRIVATE, 415, NULL
> > <unfinished ...>
> > [pid  4146] futex(0x7ff61c021254, FUTEX_WAIT_PRIVATE, 397, NULL
> > <unfinished ...>
> > [pid  4145] futex(0x7ff61c01f654, FUTEX_WAIT_PRIVATE, 405, NULL
> > <unfinished ...>
> > [pid  4144] futex(0x7ff61c00e354, FUTEX_WAIT_PRIVATE, 1, NULL
> <unfinished
> > ...>
> > [pid  4136] futex(0x7ff624b729d0, FUTEX_WAIT, 4144, NULL <unfinished
> ...>
> > [pid  4165] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> > out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564856, tv_nsec=900162344}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564856, tv_nsec=950365105}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=586325}, 0xffffffff) = -1 ETIMEDOUT
> (Connection
> > timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=50791977}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=100997890}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=151206817}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=201402531}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=251616284}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=301813556}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=352036802}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=402239182}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=452439835}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=502635489}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=552844020}, 0xffffffff <unfinished ...>
> > [pid  4156] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection
> > timed out)
> > [pid  4156] futex(0x7ff61c1aba28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4156] futex(0x7ff61c1aba54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564858, tv_nsec=506449064}, 0xffffffff <unfinished ...>
> > [pid  4165] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> > out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=603013734}, 0xffffffff) = -1 ETIMEDOUT
> > (Connection timed out)
> > [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> > [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> > {tv_sec=32564857, tv_nsec=653149664}, 0xffffffff^Cstrace: Process 4136
> > detached
> > strace: Process 4144 detached
> > strace: Process 4145 detached
> > strace: Process 4146 detached
> > strace: Process 4147 detached
> > strace: Process 4148 detached
> > strace: Process 4150 detached
> > strace: Process 4151 detached
> > strace: Process 4152 detached
> > strace: Process 4153 detached
> > ....
> >
> >
> > Could you help us to determine what's wrong with our setup?
> >
> > Thank you very much,
> >
> > Kind regards
> > Lukas Weiss
> >
> > ---
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message