lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Throughput doesn't increase when using more concurrent threads
Date Mon, 21 Nov 2005 17:43:13 GMT
Jay Booth wrote:
> I had a similar problem with threading, the problem turned out to be that in
> the back end of the FSDirectory class I believe it was, there was a
> synchronized block on the actual RandomAccessFile resource when reading a
> block of data from it... high-concurrency situations caused threads to stack
> up in front of this synchronized block and our CPU time wound up being spent
> thrashing between blocked threads instead of doing anything useful.

This is correct.  In Lucene, multiple streams per file are created by 
cloning, and all clones of an FSDirectory input stream share a 
RandomAccessFile and must synchronize input from it.  MmapDirectory does 
not have this limitation.  If your indexes are less than a few GB or you 
are using 64-bit hardware, then MmapDirectory should work well for you. 
  Otherwise it would be simple to write an nio-based Directory that does 
not use mmap that is also unsynchronized.  Such a contribution would be 

> Making multiple IndexSearchers and FSDirectories didn't help because in the
> back end, lucene consults a singleton HashMap of some kind (don't remember
> implementation) that maintained a single FSDirectory for any given index
> being accessed from the JVM... multiple calls to FSDirectory.getDirectory
> actually return the same FSDirectory object with synchronization at the same
> point.

This does not make sense to me.  FSDirectory does keep a cache of 
FSDirectory instances, but i/o should not be synchronized on these.  One 
should be able to open multiple input streams on the same file from an 
FSDirectory.  But this would not be a great solution, since file handle 
limits would soon become a problem.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message