lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <>
Subject Re: Realtime Search
Date Sat, 31 Jan 2009 00:23:31 GMT
> deletes made through reader (by docID) are immediately visible, but
through writer are buffered until a flush or reopen?

This is what I was thinking, IW buffers deletes, IR does not. Making
IW.deletes visible immediately by applying them to the IR makes sense
as well.

What should be the behavior of IW.updateDocument?

LUCENE-1314 is in and we've agreed IR.reopen causes an IW.flush
so I'll continue the LUCENE-1516 patch.

On Fri, Jan 30, 2009 at 6:04 AM, Michael McCandless <> wrote:

> Jason Rutherglen <> wrote:
> > > We'd also need to ensure when a merge kicks off, the SegmentReaders
> > > used by the merging are not newly reopened but also "borrowed" from
> >
> > The IW merge code currently opens the SegmentReader with a 4096
> > buffer size (different than the 1024 default), how will this case be
> > handled?
> I think we'd just use 1024 when merging.
> > > reopen would then flush any added docs to new segments
> >
> > IR.reopen would call IW.flush?
> I think it has to?  (Whether "it" is IR.reopen, or a class that sits
> on top of both IR & IW, I'm not sure).
> Ie the interface would be you add/delete/updateDoc, setNorm a bunch of
> times, during which none of these changes are visible to your
> currently open reader, followed by "reopen" to get a reader that then
> sees those changes?
> (This is all still brainstorming at this point of course....)
> > > When IW.commit is called, it also then asks each SegmentReader to
> > > commit. Ie, IR.commit would not be used.
> >
> > Why is this? SegmentReader.commitChanges would be called instead?
> Because IR.commit is doing other stuff (invoking deletion policy,
> syncing newly referenced files, writing new segments file, rollback
> logic on hitting an exception, etc.) that overlaps what IW.commit also
> does.  It'd be great to factor this common stuff out so IW and IR
> would share a single source.  (Yes, SR.commitChanges would be called
> directly, I think).
> > > Then when reopen is called, we must internally reopen that clone()
> > > such that its deleted docs are carried over to the newly reopened
> > > reader and newly flushed docs from IW are visible as new
> > > SegmentReaders.
> >
> > If deletes are made to the external reader (meaning the one obtained
> > by IW.getReader), then deletes are made via IW.deleteDocument, then
> > reopen is called, what happens in this case? We will need to merge
> > the del docs from the internal clone into the newly reopened reader?
> I guess we could merge them.  Ie, deletes made through reader (by
> docID) are immediately visible, but through through writer are
> buffered until a flush or reopen?
> Still, I don't like exposing two ways to do deletions, with two
> different behaviours (buffered or not).  It's weird.  Maybe, instead,
> all deletes done via IW would be immediate?
> It seems like either 1) all deletes are buffered until reopen, or 2)
> all deletes are immediately materialized.  I think half/half is too
> strange.
> > > the IR becomes transactional as well -- deletes are not visible
> > > immediately until reopen is called
> >
> > Interesting. I'd rather somehow merge the IW and external reader's
> > deletes, otherwise it seems like we're radically changing how IR
> > works. Perhaps the IW keeps a copy of the external IR that has the
> > write lock (thinking of IR.clone where the write lock is passed onto
> > the latest clone). This way IW.getReader is about the same as
> > reopen/clone (because it will call reopen on presumably the latest
> > IR).
> We'd only be "radically changing" how the RealTimeReader works.
> I think the initial approach here might be to simply open up enough
> package-private APIs or subclass-ability on IR and IW so that we can
> experiment with these realtime ideas.  Then we iterate w/ different
> experiments to see how things flesh out...
> Actually could you redo LUCENE-1516 now that LUCENE-1314 is in?
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message