lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Lucene-java Wiki] Update of "NearRealtimeSearch" by JasonRutherglen
Date Wed, 30 Sep 2009 23:53:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The "NearRealtimeSearch" page has been changed by JasonRutherglen:

  Sample code:
- IndexWriter writer;
+ IndexWriter writer; // create an IndexWriter here
+ Document doc = null; // create a document here
- writer.addDocument(doc); // update
+ writer.addDocument(doc); // update a document
  IndexReader reader = writer.getReader(); // get a reader with the new doc
+ Document addedDoc = reader.document(0);
  ==== Internals ====
    * Index Writer pools Segment Readers
    * Field caches are searched at the segment level (LUCENE-1483).  They only need to be
loaded per segment rather than for all segments (which was the functionality pre-2.9)
-   * Index Writer.getReader (LUCENE-1516) flushes changes without calling commit or flushing
deletes to disk
+   * Index Writer.getReader (LUCENE-1516) flushes updates without calling commit or flushing
deletes to disk (i.e. doesn't call fsync)
    * Speedup in indexing because instead of waiting for the RAM buffer to be written to disk,
the RAM buffer is more quickly written to the Index Writer internal RAM Directory 
    * File Switch Directory (LUCENE-1618) is used by NRT to write potentially large docstores
and term vectors to disk rather than to the RAM Directory.  This makes more RAM available
for NRT.
    * Index Reader.clone (LUCENE-1314) is used in Index Writer to carry deletes over within
segment readers.  It is also used to freeze a version so that a merge may complete and deletes
may be safely applied and searched on concurrently.  
    * Cloning bitvectors could rapidly consume heap space if updates are frequent, so LUCENE-1526
divides the bitvector into chunks.
+ ==== IO Cache ====
+ Large merges potentially bump existing segments out of the IO cache.  A query that was fast
may suddenly be slow due to the latency of accessing the hard drive.  One way to address this
is to implement a JNI based Directory that implements fadvise or madvise.  The advise calls
would allow segment merger to tell the OS not to load the segments being merged into the IO

View raw message