lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: improving search response time
Date Wed, 18 Aug 2010 16:27:47 GMT
  Most of your time is spent doing the query itself, which in the light 
of other information provided, does not surprise me.  With 12GB of RAM 
and 9GB dedicated to the java heap, the available RAM for disk caching 
is pretty low, especially if Solr is actually using all 9GB.

Since your index is 60GB, the system is most likely I/O bound.  
Available memory for disk cache is the best way to make Solr fast.  If 
you increased to 16GB RAM, you'd probably see some performance 
increase.  Going to 32GB would be better, and 64GB would let your system 
load nearly the entire index into the disk cache.

Is matchAll possibly an aggregated field with information copied from 
the other fields that you are searching?  If so, especially since you 
are using dismax, you'd want to strongly consider dropping it entirely, 
which would make your index a lot smaller.  Check your schema for 
information that could be trimmed.  You might not need "stored" on some 
fields, especially if the original values are available from another 
source (like a database, or a central filesystem).  You may not need 
advanced features on everything, like termvectors, termpositions, etc.

If you can't make significant chances in server memory or index size, 
you might want to consider going distributed.  You'd need more servers.  
A few things (More Like This being the one that comes to mind) do not 
work in a distributed index.

Can you reduce the java heap size and still have Solr work correctly?  
You probably do not need your internal Solr caches to be so huge, and 
dropping them would greatly reduce your heap needs.  Here's my cache 
settings, with the numbers being size, initialsize, then autowarm count.

filterCache: 256, 256, 0
queryResultCache: 1024, 512, 128
documentCache: 16384, 4096, n/a

I'm using distributed search with six large shards that each take up 
nearly 13GB.  The machines (VMs) have 9GB of RAM and the java heap size 
is 1280MB.  I'm not using a lot of the advanced features like 
highlighting, so I'm not using termvectors.  Right now, we use facets 
for data mining, but not in production.  My average query time is about 
100 milliseconds, with each shard's average about half that.  
Autowarming usually takes about 10-20 seconds, though sometimes it 
balloons to about 45 seconds.  I started out with much larger cache 
numbers, but that just made my autowarm times huge.

Based on my experience, I imagine that your system takes several minutes 
to autowarm your caches when you do a commit or optimize.  If you are 
doing frequent updates, that would be a major drag on performance.

Two of your caches have a larger initialsize than size, with the former 
meaning the number of slots allocated immediately and the latter 
referring to the maximum size of the cache.  Apparently it's not leading 
to any disastrous problems, but you'll want to adjust accordingly.


On 8/18/2010 9:00 AM, Muneeb Ali wrote:
> First, thanks very much for a prompt reply. Here is more info:
>
> ===============
>
> a) What operating system?
> Debian GNU/Linux 5.0
>
> b) What Java container (Tomcat/Jetty)
> Jetty
>
> c) What JAVA_OPTIONS? I.e. memory, garbage collection etc.
> -Xmx9000m   -DDEBUG   -Djava.awt.headless=true
> -Dorg.mortbay.log.class=org.mortbay.log.StdErrLog
> -Dcom.sun.management.jmxremote.port=3000
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
> -javaagent:/usr/local/lib/newrelic/newrelic.jar
>
> d) Example queries? I.e. what features, how many facets, sort fields etc
> /select?start=0&rows=20&fl=id&hl=true&hl.fl=title%2Cabstract%2Cauthors&hl.fragsize=300&hl.simple.pre=<strong>&hl.simple.post=<%2Fstrong>&qt=dismax&q=gene
> therapy
>
> We also get queries with filters examples:
>
> /select?start=0&rows=20&fl=id&hl=true&hl.fl=title%2Cabstract%2Cauthors&hl.fragsize=300&hl.simple.pre=<strong>&hl.simple.post=<%2Fstrong>&qt=dismax&q=gene
> therapy&fq=meshterm:(gene)&fq=author:(david)
>
> e) How do you load balance queries between the slaves?
>
> proxy based load balance
>
> f) What is your search latency now and @ what QPS? Also, where do you
> measure time - on the API or on the end-user page?
>
> Average response time: 2600 - 3000 ms  with average throughput: 4-6 rpm
> (from 'new relic RPM' solr performance monitor)
>
> g) How often do you replicate?
> Daily (indexer runs each night) and replicates after indexing completes at
> master. However lately we are experiencing problems right after replication,
> and have to restart jetty (its most likely that slaves are running out of
> memory).
>
> h) Are you using warm-up-queries?
> Yes, using autoWarmCount variable in cache configuration/ these are
> specified as:
>
> <filterCache class="solr.FastLRUCache"  size="5000" initialSize="1000"
> autowarmCount="500"/>
> <queryResultCache class="solr.LRUCache" size="10000" initialSize="20000"
> autowarmCount="20000"/>
> <documentCache  class="solr.LRUCache"   size="10000"  initialSize="10000"
> autowarmCount="5000"/>
>
> i) Are you ever optimizing your index?
>
> Yes, daily after indexing. We are not doing dynamic updates to index, so I
> guess its not needed to be done multiple times.
>
> j) Are you using highlighting? If so, are you using the fast vector
> highlighter or the regex?
>
> Yes, we are using the default highlight component, with default fragmenter
> called 'gap' and not regex. solr.highlight.GapFragmenter, with fragsize=300.
>
> k) What other search components are you using?
> spellcheck component, we will be using faceting in future soon.
>
> i) Are you using RAID setup for the disks? If so, what kind of RAID, what
> stripe-size and block size?
>
> Yes, RAID-0:
> $>  cat /proc/mdstat
> Personalities : [raid0]
> md0 : active raid0 sda1[0] sdb1[1]
>        449225344 blocks 64k chunks
>
>
> ==============
>
> I havn't benchmarked it yet as such, however here is the debugQuery
> <section>  from query results:
>
> <lst name="debug">
> <str name="rawquerystring">case study research</str>
> <str name="querystring">case study research</str>
> −
> <str name="parsedquery">
> +(DisjunctionMaxQuery((tags:case^1.2 | authors:case^7.5 | title:case^65.5 |
> matchAll:case | keywords:case^2.5 | meshterm:case^3.2 |
> abstract1:case^9.5)~0.01) DisjunctionMaxQuery((tags:studi^1.2 |
> authors:study^7.5 | title:study^65.5 | matchAll:study | keywords:studi^2.5 |
> meshterm:studi^3.2 | abstract1:studi^9.5)~0.01)
> DisjunctionMaxQuery((tags:research^1.2 | authors:research^7.5 |
> title:research^65.5 | matchAll:research | keywords:research^2.5 |
> meshterm:research^3.2 | abstract1:research^9.5)~0.01))
> DisjunctionMaxQuery((tags:"case studi research"~50^1.2 | authors:"case study
> research"~50^7.5 | title:"case study research"~50^65.5 | matchAll:case study
> research | keywords:"case studi research"~50^2.5 | meshterm:"case studi
> research"~50^3.2 | abstract1:"case studi research"~50^9.5)~0.01)
> FunctionQuery((sum(sdouble(yearScore)))^1.1)
> FunctionQuery((sum(sdouble(readerScore)))^2.0)
> </str>
> −
> <str name="parsedquery_toString">
> +((tags:case^1.2 | authors:case^7.5 | title:case^65.5 | matchAll:case |
> keywords:case^2.5 | meshterm:case^3.2 | abstract1:case^9.5)~0.01
> (tags:studi^1.2 | authors:study^7.5 | title:study^65.5 | matchAll:study |
> keywords:studi^2.5 | meshterm:studi^3.2 | abstract1:studi^9.5)~0.01
> (tags:research^1.2 | authors:research^7.5 | title:research^65.5 |
> matchAll:research | keywords:research^2.5 | meshterm:research^3.2 |
> abstract1:research^9.5)~0.01) (tags:"case studi research"~50^1.2 |
> authors:"case study research"~50^7.5 | title:"case study research"~50^65.5 |
> matchAll:case study research | keywords:"case studi research"~50^2.5 |
> meshterm:"case studi research"~50^3.2 | abstract1:"case studi
> research"~50^9.5)~0.01 (sum(sdouble(yearScore)))^1.1
> (sum(sdouble(readerScore)))^2.0
> </str>
> −
> <lst name="explain">
> −
> <str name="7644c450-6d00-11df-a2b2-0026b95e3eb7">
>
> 9.473454 = (MATCH) sum of:
>    2.247054 = (MATCH) sum of:
>      0.7535966 = (MATCH) max plus 0.01 times others of:
>        0.7535966 = (MATCH) weight(title:case^65.5 in 6557735), product of:
>          0.29090396 = queryWeight(title:case^65.5), product of:
>            65.5 = boost
>            5.181068 = idf(docFreq=204956, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.590534 = (MATCH) fieldWeight(title:case in 6557735), product of:
>            1.0 = tf(termFreq(title:case)=1)
>            5.181068 = idf(docFreq=204956, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=6557735)
>      0.5454388 = (MATCH) max plus 0.01 times others of:
>        0.5454388 = (MATCH) weight(title:study^65.5 in 6557735), product of:
>          0.24748746 = queryWeight(title:study^65.5), product of:
>            65.5 = boost
>            4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.2039049 = (MATCH) fieldWeight(title:study in 6557735), product of:
>            1.0 = tf(termFreq(title:study)=1)
>            4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=6557735)
>      0.9480188 = (MATCH) max plus 0.01 times others of:
>        0.9480188 = (MATCH) weight(title:research^65.5 in 6557735), product
> of:
>          0.32627863 = queryWeight(title:research^65.5), product of:
>            65.5 = boost
>            5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.9055498 = (MATCH) fieldWeight(title:research in 6557735), product
> of:
>            1.0 = tf(termFreq(title:research)=1)
>            5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=6557735)
>    6.6579494 = (MATCH) max plus 0.01 times others of:
>      6.6579494 = weight(title:"case study research"~50^65.5 in 6557735),
> product of:
>        0.86467004 = queryWeight(title:"case study research"~50^65.5), product
> of:
>          65.5 = boost
>          15.399977 = idf(title: case=204956 study=444103 research=109154)
>          8.5721357E-4 = queryNorm
>        7.6999884 = fieldWeight(title:"case study research" in 6557735),
> product of:
>          1.0 = tf(phraseFreq=1.0)
>          15.399977 = idf(title: case=204956 study=444103 research=109154)
>          0.5 = fieldNorm(field=title, doc=6557735)
>    0.053200547 = (MATCH) FunctionQuery(sum(sdouble(yearScore))), product of:
>      56.420166 = sum(sdouble(yearScore)=56.42016783216783)
>      1.1 = boost
>      8.5721357E-4 = queryNorm
>    0.5152504 = (MATCH) FunctionQuery(sum(sdouble(readerScore))), product of:
>      300.53793 = sum(sdouble(readerScore)=300.5379289983797)
>      2.0 = boost
>      8.5721357E-4 = queryNorm
> </str>
> −
> ...
> ...
> ...
> −
> <str name="e3542c60-6d06-11df-afb8-0026b95d30b2">
>
> 9.212496 = (MATCH) sum of:
>    2.247054 = (MATCH) sum of:
>      0.7535966 = (MATCH) max plus 0.01 times others of:
>        0.7535966 = (MATCH) weight(title:case^65.5 in 12274669), product of:
>          0.29090396 = queryWeight(title:case^65.5), product of:
>            65.5 = boost
>            5.181068 = idf(docFreq=204956, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.590534 = (MATCH) fieldWeight(title:case in 12274669), product of:
>            1.0 = tf(termFreq(title:case)=1)
>            5.181068 = idf(docFreq=204956, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=12274669)
>      0.5454388 = (MATCH) max plus 0.01 times others of:
>        0.5454388 = (MATCH) weight(title:study^65.5 in 12274669), product of:
>          0.24748746 = queryWeight(title:study^65.5), product of:
>            65.5 = boost
>            4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.2039049 = (MATCH) fieldWeight(title:study in 12274669), product
> of:
>            1.0 = tf(termFreq(title:study)=1)
>            4.4078097 = idf(docFreq=444103, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=12274669)
>      0.9480188 = (MATCH) max plus 0.01 times others of:
>        0.9480188 = (MATCH) weight(title:research^65.5 in 12274669), product
> of:
>          0.32627863 = queryWeight(title:research^65.5), product of:
>            65.5 = boost
>            5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>            8.5721357E-4 = queryNorm
>          2.9055498 = (MATCH) fieldWeight(title:research in 12274669), product
> of:
>            1.0 = tf(termFreq(title:research)=1)
>            5.8110995 = idf(docFreq=109154, maxDocs=13411507)
>            0.5 = fieldNorm(field=title, doc=12274669)
>    6.6579494 = (MATCH) max plus 0.01 times others of:
>      6.6579494 = weight(title:"case study research"~50^65.5 in 12274669),
> product of:
>        0.86467004 = queryWeight(title:"case study research"~50^65.5), product
> of:
>          65.5 = boost
>          15.399977 = idf(title: case=204956 study=444103 research=109154)
>          8.5721357E-4 = queryNorm
>        7.6999884 = fieldWeight(title:"case study research" in 12274669),
> product of:
>          1.0 = tf(phraseFreq=1.0)
>          15.399977 = idf(title: case=204956 study=444103 research=109154)
>          0.5 = fieldNorm(field=title, doc=12274669)
>    0.030677302 = (MATCH) FunctionQuery(sum(sdouble(yearScore))), product of:
>      32.533848 = sum(sdouble(yearScore)=32.533846153846156)
>      1.1 = boost
>      8.5721357E-4 = queryNorm
>    0.27681494 = (MATCH) FunctionQuery(sum(sdouble(readerScore))), product of:
>      161.46207 = sum(sdouble(readerScore)=161.46207100162033)
>      2.0 = boost
>      8.5721357E-4 = queryNorm
> </str>
> </lst>
> <str name="QParser">DisMaxQParser</str>
> <null name="altquerystring"/>
> −
> <arr name="boostfuncs">
> −
> <str>
>
>           sum(readerScore)^2  sum(yearScore)^1.1
>
>
> </str>
> </arr>
> −
> <lst name="timing">
> <double name="time">5468.0</double>
> −
> <lst name="prepare">
> <double name="time">1.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">1.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.SpellCheckComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> −
> <lst name="process">
> <double name="time">5467.0</double>
> −
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">4734.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">231.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.SpellCheckComponent">
> <double name="time">0.0</double>
> </lst>
> −
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">501.0</double>
> </lst>
> </lst>
> </lst>
> </lst>


Mime
View raw message