lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Le Normand <manuel.lenorm...@gmail.com>
Subject Re: Profiling Solr Lucene for query
Date Sun, 08 Sep 2013 14:12:46 GMT
As I am not running queries in parallel every query is handled by a single
CPU, I didn't see any benefit from splitting it to less shards. More to it,
while running load tests on a single shard I could see I was CPU bounded
and got much better performances by splitting the index to many shards,
while it's possible to see an assymptotic improvment when the number of
shards raise from about this amount of shards.

These 36 servers (each server has 2 replicas) are running virtual, 16GB
memory each (4GB for JVM, 12GB remain for the OS caching),  consuming 260GB
of disk mounted for the index files.
As you cas understand from the numbers, only a small portion of my index
can get loaded on memory, this is why I attempt to minimize the number of
times I try reading the term blocks.

The benefit from lowering down the term interval would be to obligate the
FST to get on memory (JVM - thanks to the NRTCachingDirectory) as I do not
control the term file (OS caching, loads an average of 6% of it).


On Sun, Sep 8, 2013 at 2:54 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Why have 36 shards for just a few million docs each? That's the first thing
> I'd
> look at. How many physical boxes? How much memory per JVM? How
> many JVMs? How much physical memory per box?
>
> 'Cause this seems excessive time-wise for loading the info.....
>
> Best
> Erick
>
>
> On Sun, Sep 8, 2013 at 7:03 AM, Manuel Le Normand <
> manuel.lenormand@gmail.com> wrote:
>
> > Hello all
> > Looking on the 10% slowest queries, I get very bad performances (~60 sec
> > per query).
> > These queries have lots of conditions on my main field (more than a
> > hundred), including phrase queries and rows=1000. I do return only id's
> > though.
> > I can quite firmly say that this bad performance is due to slow storage
> > issue (that are beyond my control for now). Despite this I want to
> improve
> > my performances.
> >
> > As tought in school, I started profiling these queries and the data of ~1
> > minute profile is located here:
> > http://picpaste.com/pics/IMG_20130908_132441-ZyrfXeTY.1378637843.jpg
> >
> > Main observation: most of the time I do wait for readVInt, who's
> stacktrace
> > (2 out of 2 thread dumps) is:
> >
> > catalina-exec-3870 - Thread t@6615
> >  java.lang.Thread.State: RUNNABLE
> >  at org.apadhe.lucene.store.DataInput.readVInt(DataInput.java:108)
> >  at
> >
> >
> org.apaChe.lucene.codeosAockTreeIermsReade$FieldReader$SegmentTermsEnumFrame.loadBlock(BlockTreeTermsReader.java:
> 2357)
> >  at
> >
> >
> ora.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1745)
> >  at org.apadhe.lucene.index.TermContext.build(TermContext.java:95)
> >  at
> >
> >
> org.apache.lucene.search.PhraseQuery$PhraseWeight.<init>(PhraseQuery.java:221)
> >  at
> org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:326)
> >  at
> >
> >
> org.apache.lucene.search.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
> >  at
> > org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> >  at
> >
> >
> org.apache.lucene.searth.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
> >  at
> > oro.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> >  at
> >
> >
> org.apache.lucene.searth.BooleanQuery$BooleanWeight.<init>(BooleanQuery.java:183)
> >  at
> > org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> >  at
> >
> >
> org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:675)
> >  at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
> >
> >
> > So I do actually wait for IO as expected, but I might be too many time
> page
> > faulting while looking for the TermBlocks (tim file), ie locating the
> term.
> > As I reindex now, would it be useful lowering down the termInterval
> > (default to 128)? As the FST (tip files) are that small (few 10-100 MB)
> so
> > there are no memory contentions, could I lower down this param to 8 for
> > example?
> >
> > General configs:
> > solr 4.3
> > 36 shards, each has few million docs
> >
> > Thanks in advance,
> > Manu
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message