lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Weiss <Lukas.We...@raiffeisen.it>
Subject Antwort: Re: High CPU usage with Solr 7.7.0
Date Wed, 27 Feb 2019 15:28:01 GMT
I can confirm this. Downgrading to 7.6.0 solved the issue.
Thanks for the hint.



Von:    "Joe Obernberger" <joseph.obernberger@gmail.com>
An:     solr-user@lucene.apache.org, "Lukas Weiss" 
<Lukas.Weiss@raiffeisen.it>, 
Datum:  27.02.2019 15:59
Betreff:        Re: High CPU usage with Solr 7.7.0



Just to add to this.  We upgraded to 7.7.0 and saw very large CPU usage 
on multi core boxes - sustained in the 1200% range.  We then switched to 
7.6.0 (no other configuration changes) and the problem went away.

We have a 40 node cluster and all 40 nodes had high CPU usage with 3 
indexes stored on HDFS.

-Joe

On 2/27/2019 5:04 AM, Lukas Weiss wrote:
> Hello,
>
> we recently updated our Solr server from 6.6.5 to 7.7.0. Since then, we
> have problems with the server's CPU usage.
> We have two Solr cores configured, but even if we clear all indexes and 
do
> not start the index process, we see 100 CPU usage for both cores.
>
> Here's what our top says:
>
> root@solr:~ # top
> top - 09:25:24 up 17:40,  1 user,  load average: 2,28, 2,56, 2,68
> Threads:  74 total,   3 running,  71 sleeping,   0 stopped,   0 zombie
> %Cpu0  :100,0 us,  0,0 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si,
> 0,0 st
> %Cpu1  :100,0 us,  0,0 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,0 si,
> 0,0 st
> %Cpu2  : 11,3 us,  1,0 sy,  0,0 ni, 86,7 id,  0,7 wa,  0,0 hi,  0,3 si,
> 0,0 st
> %Cpu3  :  3,0 us,  3,0 sy,  0,0 ni, 93,7 id,  0,3 wa,  0,0 hi,  0,0 si,
> 0,0 st
> KiB Mem :  8388608 total,  7859168 free,   496744 used,    32696
> buff/cache
> KiB Swap:  2097152 total,  2097152 free,        0 used.  7859168 avail 
Mem
>
>
>    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ 
COMMAND
>                P
> 10209 solr      20   0 6138468 452520  25740 R 99,9  5,4  29:43.45 java
> -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
> -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 24
> 10214 solr      20   0 6138468 452520  25740 R 99,9  5,4  28:42.91 java
> -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8
> -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 25
>
> The solr server is installed on a Debian Stretch 9.8 (64bit) on Linux 
LXC
> dedicated Container.
>
> Some more server info:
>
> root@solr:~ # java -version
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-2~deb9u1-b13)
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
>
> root@solr:~ # free -m
>                total        used        free      shared  buff/cache
> available
> Mem:           8192         484        7675         701          31 7675
> Swap:          2048           0        2048
>
> We also found something strange if we do an strace of the main process, 
we
> get lots of ongoing connection timeouts:
>
> root@solr:~ # strace -F -p 4136
> strace: Process 4136 attached with 48 threads
> strace: [ Process PID=11089 runs in x32 mode. ]
> [pid  4937] epoll_wait(139,  <unfinished ...>
> [pid  4936] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4909] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4618] epoll_wait(136,  <unfinished ...>
> [pid  4576] futex(0x7ff61ce66474, FUTEX_WAIT_PRIVATE, 1, NULL 
<unfinished
> ...>
> [pid  4279] futex(0x7ff61ce62b34, FUTEX_WAIT_PRIVATE, 2203, NULL
> <unfinished ...>
> [pid  4244] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4227] futex(0x7ff56c71ae14, FUTEX_WAIT_PRIVATE, 2237, NULL
> <unfinished ...>
> [pid  4243] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4228] futex(0x7ff5608331a4, FUTEX_WAIT_PRIVATE, 2237, NULL
> <unfinished ...>
> [pid  4208] futex(0x7ff61ce63e54, FUTEX_WAIT_PRIVATE, 5, NULL 
<unfinished
> ...>
> [pid  4205] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4204] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4196] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4195] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4194] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4193] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4187] restart_syscall(<... resuming interrupted restart_syscall 
...>
> <unfinished ...>
> [pid  4180] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4179] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4177] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4174] accept(133,  <unfinished ...>
> [pid  4173] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4172] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4171] restart_syscall(<... resuming interrupted restart_syscall 
...>
> <unfinished ...>
> [pid  4165] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4164] futex(0x7ff61c1f5054, FUTEX_WAIT_PRIVATE, 3, NULL 
<unfinished
> ...>
> [pid  4163] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4162] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4161] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4160] futex(0x7ff623d52c20,
> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff
> <unfinished ...>
> [pid  4159] futex(0x7ff61c1e9d54, FUTEX_WAIT_PRIVATE, 7, NULL 
<unfinished
> ...>
> [pid  4158] futex(0x7ff61c1b7f54, FUTEX_WAIT_PRIVATE, 15, NULL 
<unfinished
> ...>
> [pid  4157] futex(0x7ff61c1b5554, FUTEX_WAIT_PRIVATE, 19, NULL 
<unfinished
> ...>
> [pid  4156] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4155] restart_syscall(<... resuming interrupted futex ...>
> <unfinished ...>
> [pid  4153] futex(0x7ff61c06c754, FUTEX_WAIT_PRIVATE, 7, NULL 
<unfinished
> ...>
> [pid  4152] futex(0x7ff61c06ab54, FUTEX_WAIT_PRIVATE, 3, NULL 
<unfinished
> ...>
> [pid  4151] futex(0x7ff61c068f54, FUTEX_WAIT_PRIVATE, 7, NULL 
<unfinished
> ...>
> [pid  4150] futex(0x7ff61c067354, FUTEX_WAIT_PRIVATE, 7, NULL 
<unfinished
> ...>
> [pid  4148] futex(0x7ff61c024a54, FUTEX_WAIT_PRIVATE, 403, NULL
> <unfinished ...>
> [pid  4165] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection
> timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564856, tv_nsec=849859736}, 0xffffffff <unfinished ...>
> [pid  4147] futex(0x7ff61c022e54, FUTEX_WAIT_PRIVATE, 415, NULL
> <unfinished ...>
> [pid  4146] futex(0x7ff61c021254, FUTEX_WAIT_PRIVATE, 397, NULL
> <unfinished ...>
> [pid  4145] futex(0x7ff61c01f654, FUTEX_WAIT_PRIVATE, 405, NULL
> <unfinished ...>
> [pid  4144] futex(0x7ff61c00e354, FUTEX_WAIT_PRIVATE, 1, NULL 
<unfinished
> ...>
> [pid  4136] futex(0x7ff624b729d0, FUTEX_WAIT, 4144, NULL <unfinished 
...>
> [pid  4165] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564856, tv_nsec=900162344}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564856, tv_nsec=950365105}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=586325}, 0xffffffff) = -1 ETIMEDOUT 
(Connection
> timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=50791977}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=100997890}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=151206817}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=201402531}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=251616284}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=301813556}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=352036802}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=402239182}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=452439835}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=502635489}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=552844020}, 0xffffffff <unfinished ...>
> [pid  4156] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection
> timed out)
> [pid  4156] futex(0x7ff61c1aba28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4156] futex(0x7ff61c1aba54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564858, tv_nsec=506449064}, 0xffffffff <unfinished ...>
> [pid  4165] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed
> out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=603013734}, 0xffffffff) = -1 ETIMEDOUT
> (Connection timed out)
> [pid  4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1,
> {tv_sec=32564857, tv_nsec=653149664}, 0xffffffff^Cstrace: Process 4136
> detached
> strace: Process 4144 detached
> strace: Process 4145 detached
> strace: Process 4146 detached
> strace: Process 4147 detached
> strace: Process 4148 detached
> strace: Process 4150 detached
> strace: Process 4151 detached
> strace: Process 4152 detached
> strace: Process 4153 detached
> ....
>
>
> Could you help us to determine what's wrong with our setup?
>
> Thank you very much,
>
> Kind regards
> Lukas Weiss
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message