lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Profiling Solr Lucene for query
Date Sun, 08 Sep 2013 11:54:38 GMT
Why have 36 shards for just a few million docs each? That's the first thing
I'd
look at. How many physical boxes? How much memory per JVM? How
many JVMs? How much physical memory per box?

'Cause this seems excessive time-wise for loading the info.....

Best
Erick


On Sun, Sep 8, 2013 at 7:03 AM, Manuel Le Normand <
manuel.lenormand@gmail.com> wrote:

> Hello all
> Looking on the 10% slowest queries, I get very bad performances (~60 sec
> per query).
> These queries have lots of conditions on my main field (more than a
> hundred), including phrase queries and rows=1000. I do return only id's
> though.
> I can quite firmly say that this bad performance is due to slow storage
> issue (that are beyond my control for now). Despite this I want to improve
> my performances.
>
> As tought in school, I started profiling these queries and the data of ~1
> minute profile is located here:
> http://picpaste.com/pics/IMG_20130908_132441-ZyrfXeTY.1378637843.jpg
>
> Main observation: most of the time I do wait for readVInt, who's stacktrace
> (2 out of 2 thread dumps) is:
>
> catalina-exec-3870 - Thread t@6615
>  java.lang.Thread.State: RUNNABLE
>  at org.apadhe.lucene.store.DataInput.readVInt(DataInput.java:108)
>  at
>
> org.apaChe.lucene.codeosAockTreeIermsReade$FieldReader$SegmentTermsEnumFrame.loadBlock(BlockTreeTermsReader.java:2357)
>  at
>
> ora.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1745)
>  at org.apadhe.lucene.index.TermContext.build(TermContext.java:95)
>  at
>
> org.apache.lucene.search.PhraseQuery$PhraseWeight.<init>(PhraseQuery.java:221)
>  at org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:326)
>  at
>
> org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
>  at
> org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
>  at
>
> org.apache.lucene.searth.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
>  at
> oro.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
>  at
>
> org.apache.lucene.searth.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
>  at
> org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
>  at
>
> org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:675)
>  at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
>
>
> So I do actually wait for IO as expected, but I might be too many time page
> faulting while looking for the TermBlocks (tim file), ie locating the term.
> As I reindex now, would it be useful lowering down the termInterval
> (default to 128)? As the FST (tip files) are that small (few 10-100 MB) so
> there are no memory contentions, could I lower down this param to 8 for
> example?
>
> General configs:
> solr 4.3
> 36 shards, each has few million docs
>
> Thanks in advance,
> Manu
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message