lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: NIO2 Directory implementations
Date Sun, 17 Mar 2013 15:12:37 GMT
These directory implementations sound very interesting!

Yes, please open a Jira issue and attach a patch.

Some responses below:

On Sat, Mar 16, 2013 at 10:13 PM, Michael Poindexter
<staticsnow@gmail.com> wrote:
> As part of a project using Lucene I have implemented a trio of Directories
> roughly corresponding to the FSDirectory implementations in core.  These
> directory implementations use the NIO2 API's in JDK7 when opening files.
>  This ensures that on Windows the files are opened in a mode that allows
> deletion even if the file is open elsewhere.
>
> 1.) JDK7MMapDirectory - Roughly the same as MMapDirectory.  Uses
> FileChannel.open (instead of RandomAccessFile) to create a FileChannel that
> then has map() called on it to create the mapped buffers.
> 2.) JDK7NIOFSDirectory - Roughly the same as NIOFSDirectory, but uses
> FileChannel.open to create the file channel instead of RandomAccessFile.
> 3.) JDK7AsyncFSDirectory - This one is new and different.  I needed a
> replacement for SimpleFSDirectory (that was not susceptible to problems if
> interrupt()'ed) and did not have the synchronization problems on Windows of
> NIOFSDirectory.  This one is used where SimpleFSDirectory could have been
> used, but uses an AsynchronousFileChannel to do it's work.  The actual
> operation is still synchronous, but on Windows AsynchronousFileChannel uses
> overlapped IO, and hence does not require synchronization on the position
> and should be safe for interrupts.

Awesome!

On Unix would this impl also be safe for interrupts?

> A couple of questions:
> 1.)  Is there any interest in me contributing these to Lucene?  They
> require JDK7+, but perhaps they could go in a contrib module?

Maybe in the misc module (lucene/misc)?

> 2.)  While implementing these I noticed the implementation of
> FSDirectory.sync seems a little strange:  It just opens a new
> RandomAccessFile and forces a sync using it.  The JavaDocs seem to imply
> that this would force a sync on the file handle associated with the
> RandomAccessFile, but that's not the file handle that was written to as
> part of an IndexOutput.  On Windows at least this won't matter, but it
> seems theoretically wrong...i.e. according to the JavaDoc on a given
> platform this style of operation could have no impact if I am understanding
> it correctly.  It seems like maybe it would be better to have a sync() call
> on an IndexOutput that can be called before closing it...am I missing
> something here?

Yes, this is indeed very strange: ideally we'd fsync on the
IndexOutput before it was closed, but this is unfortunately tricky to
do in Lucene because at the time we write to the IndexOutput we don't
know if it will be a file that will be commit'd in the future.

This was also raised in https://issues.apache.org/jira/browse/LUCENE-3237

> 3.)  What is the best way to go about benchmarking/testing these new
> implementations to compare against the core FSDirectory implementations?
>  I've seen some references to randomized tests and benchmarks on the
> developer pages on the Lucene website, but I didn't see anything that was
> along the lines of "Here's how to run the benchmarks"...any pointers would
> be much appreciated.

I think start with an issue/patch and then others can help with
benchmarking?  I would use luceneutil to run a standard
indexings/searching test using the Wikipedia corpus.

> Thanks,

Thank you!

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message