lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justus Pendleton <>
Subject Re: Performance of never optimizing
Date Wed, 05 Nov 2008 06:04:34 GMT
On 05/11/2008, at 4:36 AM, Michael McCandless wrote:
> If possible, you should try to use a larger corpus (eg Wikipedia)  
> rather than multiply Reuters by N, which creates unnatural term  
> frequency distribution.

I'll replicate the tests with the wikipedia corpus over the next few  
days and regenerate the graphs to show the data points in addition to  
the curves. The data I am using comes from the output on the benchmark  

      [java] Operation                       round mrg   runCnt    
recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
      [java] UnoptSearch_100_Par     0       2               
1                 100        230.4               0.43         
29,517,680        44,834,816

I am plotting the "rec/s" which I am (possibly mistakenly)  
interpreting to mean "searches per second" as I asked for 100 searches  
and it took 0.43 seconds to perform them all.

> It's best to use a real query log, if possible, to run the queries.   
> If you are expecting your production machines to have plenty of RAM  
> to hold the index, then you should definitely run the queries  
> through once, discard it, to get all stuff loaded in RAM including  
> the OS caching all required data in its IO cache.
> Not opening/closing a reader per search should change the graphs  
> quite a bit (for the better) and hopefully change some of the odd  
> things you are seeing (in the questions below).

I don't believe our large users to have enough memory for Lucene  
indexes to fit in RAM. (Especially given we use quite a bit of RAM for  
other stuff.) I think we also close readers pretty frequently  
(whenever any user updates a JIRA issue, which I am assuming happening  
nearly constantly when you've got thousands of users). I was trying to  
mimic our usage as closely as I could to see whether Lucene behaves  
pathologically poorly given our current architecture. There have been  
some excellent suggestions about using in-memory indexes for recent  
updates but changes of that kind are, unfortunately, currently outside  
of my purview :-(

Given that our current usage may be suboptimal :-/ does anyone have  
any ideas about what may be causing the anomalies I identified earlier?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message