lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <jason.rutherg...@gmail.com>
Subject Re: Realtime Search
Date Fri, 09 Jan 2009 20:18:49 GMT
> "But I think for realtime we don't want to be using IW's deletion at
all.  We should do all deletes via the IndexReader.  In fact if IW has
handed out a reader (via getReader()) and that reader (or a reopened
derivative) remains open we may have to block deletions via IW.  Not
sure..."

Can't IW use the IR to do it's deletions?  Currently deletions in IW are
implemented in DocumentsWriter.applyDeletes by loading a segment with
SegmentReader.get() and making the deletions which causes term index load
overhead per flush.  If IW has an internal IR then the deletion process can
use it (not SegmentReader.get) and there should not be a conflict anymore
between the IR and IW deletion processes.

> "we may have to block deletions via IW"

Hopefully they can be buffered.

Where else does the write lock need to be coordinated between IR and IW?

> "somehow IW & IR have to "split" the write lock else we may
need to merge deletions somehow."

This is a part I'd like to settle on before start of implementation.  It
looks like in IW deletes are buffered as terms or queries until flushed.  I
don't think there needs to be a lock until the flush is performed?

For the merge changes to the index, the deletionpolicy can be used to insure
a reader still has access to the segments it needs from the main directory.


> "We have to test performance to measure the net add -> search latency.
For many apps this approach may be plenty fast.  If your IO system is
an SSD it could be extremely fast.  Swapping in RAMDir
just makes it faster w/o changing the basic approach."

It is true that this is best way to start and in fact may be good enough for
many users.  It could help new users to expose a reader from IW so the
delineation between them is removed and Lucene becomes easier to use.

At the very least this system allows concurrently updateable IR and IW due
to sharing the write lock something that has is currently incorrect in
Lucene.

> "Besides the transaction log (for crash recovery), which should fit
"above" Lucene nicely, what else is needed for realtime beyond the
single-transaction support Lucene already provides?"

What we have described above (exposing IR via IW) will be sufficient and
realtime will live above it.



On Fri, Jan 9, 2009 at 11:15 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Jason Rutherglen <jason.rutherglen@gmail.com> wrote:
>
> > Are you referring to the IW.pendingCommit SegmentInfos variable?
>
> No, I'm referring to segmentInfos.  (pendingCommit is the "snapshot"
> of segmentInfos taken when committing...).
>
> > When you say "flushed" you are referring to the IW.prepareCommit method?
>
> No, I'm referring to "flush"... it writes a new segment but not a new
> segments_N, does not sync the files, and does not invoke the deletion
> policy.
>
> > I think step #1 is important and should be generally useful outside of
> realtime search, however it's unclear how/when calls to IW.deleteDocument
> will reflect in IW.getReader?
>
> You'd have to flush (to materialize pending deletions inside IW) then
> reopen the reader, to see any deletions done via the writer.  But I
> think instead realtime search would do deletions via the reader
> (because if you use IW you're updating deletes through the Directory =
> too slow).
>
> > Interleaving deletes with documents added isn't possible because if the
> documents are in the IW ram buffer, they are not necessarily deleted
>
> Well, we buffer the delete and then on flush we materialize the
> delete.  So if you add a doc with field X=77, then delete-by-term
> X:77, then flush, you'll flush a 1 document segment whose only
> document is marked as deleted.
>
> But I think for realtime we don't want to be using IW's deletion at
> all.  We should do all deletes via the IndexReader.  In fact if IW has
> handed out a reader (via getReader()) and that reader (or a reopened
> derivative) remains open we may have to block deletions via IW.  Not
> sure... somehow IW & IR have to "split" the write lock else we may
> need to merge deletions somehow.
>
> > If this is swapped in later how is the system realtime except perhaps
> deletes?
>
> We have to test performance to measure the net add -> search latency.
> For many apps this approach may be plenty fast.  If your IO system is
> an SSD it could be extremely fast.  Swapping in RAMDir
> just makes it faster w/o changing the basic approach.
>
> > Adding support for multiple transactions at once on IndexWriter outside
> of the realtime transactions seems to require a lot of refactoring.
>
> Besides the transaction log (for crash recovery), which should fit
> "above" Lucene nicely, what else is needed for realtime beyond the
> single-transaction support Lucene already provides?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message