lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Realtime Search
Date Fri, 30 Jan 2009 14:04:53 GMT
Jason Rutherglen <jason.rutherglen@gmail.com> wrote:

> > We'd also need to ensure when a merge kicks off, the SegmentReaders
> > used by the merging are not newly reopened but also "borrowed" from
>
> The IW merge code currently opens the SegmentReader with a 4096
> buffer size (different than the 1024 default), how will this case be
> handled?

I think we'd just use 1024 when merging.

> > reopen would then flush any added docs to new segments
>
> IR.reopen would call IW.flush?

I think it has to?  (Whether "it" is IR.reopen, or a class that sits
on top of both IR & IW, I'm not sure).

Ie the interface would be you add/delete/updateDoc, setNorm a bunch of
times, during which none of these changes are visible to your
currently open reader, followed by "reopen" to get a reader that then
sees those changes?

(This is all still brainstorming at this point of course....)

> > When IW.commit is called, it also then asks each SegmentReader to
> > commit. Ie, IR.commit would not be used.
>
> Why is this? SegmentReader.commitChanges would be called instead?

Because IR.commit is doing other stuff (invoking deletion policy,
syncing newly referenced files, writing new segments file, rollback
logic on hitting an exception, etc.) that overlaps what IW.commit also
does.  It'd be great to factor this common stuff out so IW and IR
would share a single source.  (Yes, SR.commitChanges would be called
directly, I think).

> > Then when reopen is called, we must internally reopen that clone()
> > such that its deleted docs are carried over to the newly reopened
> > reader and newly flushed docs from IW are visible as new
> > SegmentReaders.
>
> If deletes are made to the external reader (meaning the one obtained
> by IW.getReader), then deletes are made via IW.deleteDocument, then
> reopen is called, what happens in this case? We will need to merge
> the del docs from the internal clone into the newly reopened reader?

I guess we could merge them.  Ie, deletes made through reader (by
docID) are immediately visible, but through through writer are
buffered until a flush or reopen?

Still, I don't like exposing two ways to do deletions, with two
different behaviours (buffered or not).  It's weird.  Maybe, instead,
all deletes done via IW would be immediate?

It seems like either 1) all deletes are buffered until reopen, or 2)
all deletes are immediately materialized.  I think half/half is too
strange.

> > the IR becomes transactional as well -- deletes are not visible
> > immediately until reopen is called
>
> Interesting. I'd rather somehow merge the IW and external reader's
> deletes, otherwise it seems like we're radically changing how IR
> works. Perhaps the IW keeps a copy of the external IR that has the
> write lock (thinking of IR.clone where the write lock is passed onto
> the latest clone). This way IW.getReader is about the same as
> reopen/clone (because it will call reopen on presumably the latest
> IR).

We'd only be "radically changing" how the RealTimeReader works.

I think the initial approach here might be to simply open up enough
package-private APIs or subclass-ability on IR and IW so that we can
experiment with these realtime ideas.  Then we iterate w/ different
experiments to see how things flesh out...

Actually could you redo LUCENE-1516 now that LUCENE-1314 is in?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message