lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Profiling Solr Lucene for query
Date Sun, 08 Sep 2013 13:00:09 GMT
Please send Solr-related inquiries to the Solr user list - this is the 
Lucene (Java) user list.

-- Jack Krupansky

-----Original Message----- 
From: Manuel Le Normand
Sent: Sunday, September 08, 2013 7:03 AM
To: java-user@lucene.apache.org
Subject: Profiling Solr Lucene for query

Hello all
Looking on the 10% slowest queries, I get very bad performances (~60 sec
per query).
These queries have lots of conditions on my main field (more than a
hundred), including phrase queries and rows=1000. I do return only id's
though.
I can quite firmly say that this bad performance is due to slow storage
issue (that are beyond my control for now). Despite this I want to improve
my performances.

As tought in school, I started profiling these queries and the data of ~1
minute profile is located here:
http://picpaste.com/pics/IMG_20130908_132441-ZyrfXeTY.1378637843.jpg

Main observation: most of the time I do wait for readVInt, who's stacktrace
(2 out of 2 thread dumps) is:

catalina-exec-3870 - Thread t@6615
java.lang.Thread.State: RUNNABLE
at org.apadhe.lucene.store.DataInput.readVInt(DataInput.java:108)
at
org.apaChe.lucene.codeosAockTreeIermsReade$FieldReader$SegmentTermsEnumFrame.loadBlock(BlockTreeTermsReader.java:2357)
at
ora.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1745)
at org.apadhe.lucene.index.TermContext.build(TermContext.java:95)
at
org.apache.lucene.search.PhraseQuery$PhraseWeight.<init>(PhraseQuery.java:221)
at org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:326)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
at
org.apache.lucene.searth.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
at
oro.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
at
org.apache.lucene.searth.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
at
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
at
org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:675)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)


So I do actually wait for IO as expected, but I might be too many time page
faulting while looking for the TermBlocks (tim file), ie locating the term.
As I reindex now, would it be useful lowering down the termInterval
(default to 128)? As the FST (tip files) are that small (few 10-100 MB) so
there are no memory contentions, could I lower down this param to 8 for
example?

General configs:
solr 4.3
36 shards, each has few million docs

Thanks in advance,
Manu 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message