lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1749) FieldCache introspection API
Date Sat, 01 Aug 2009 12:10:14 GMT


Michael McCandless commented on LUCENE-1749:

This was an excellent idea, and it's great that it uncovered some
dangerous and very unexpected places where we are passing top-level
reader to the FieldCache (eg that explain() could suddenly populate
the FieldCache w/ top-level values is quite shocking!).

ReaderUtil.subSearcher is doing the same thing as

I love the RAMUsageEstimator... we have other places that estimate RAM
(eg IndexWriter does so for added & deleted docs) that we should
eventually cutover to this new API.

I particularly love the new class named Insanity:

  public static Insanity[] checkSanity(FieldCache cache)

MultiDocIdSet/Iterator makes me a bit nervous, because it's further
"propogating" a non-segment-based iterator deeper into Lucene than I
think we want to.  It's similar to eg using
DirectoryReader.MultiTermDocs (what Lucene used to do), instead of
stepping through the segments yourself.

Also, shouldn't explain most closely match what was done during
searching (ie, run "per segment")?  So simply pushing explain down to
the sub-reader that has the doc seems appropriate?  Ie we want it to
share as much of the code path as possible with how searching was in
fact done?

EG for ConstantScoreQuery.explain, it seems like we should 1) locate
the sub-reader that this doc falls in, and 2) get a scorer against
that reader, then 3) build up the explanation from that?  Likewise for

In fact.... maybe we should simply fix IndexSearcher.explain to do
this for all queries?  Ie, get the top-level weight, locate sub-reader
that has the doc, un-base the doc, and then invoke QueryWeight.explain
with that sub-reader and un-based doc?  Then we don't have to do
anything special for each query.  I think QueryWeight.scorer()
shouldn't be expected to handle a "top level reader" being passed in.
Ie, higher up in Lucene we should do that switch, so that we don't
have to do it (this "valuesFromSubReaders" arg) for every scorer.

Hmm: why do we even have explain at both the QueryWeight and Scorer
"levels"?  It seems like we should pick one level and do it there,
consistently.  Most queries seem to only implement the QueryWeight one
and often simply throw UOE in the Scorer's explain, but eg PhraseQuery
implements in both places.

(BTW: I'll be offline for approx the next 36 hours or so!)

> FieldCache introspection API
> ----------------------------
>                 Key: LUCENE-1749
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: fieldcache-introspection.patch, LUCENE-1749-hossfork.patch, LUCENE-1749.patch,
LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch, LUCENE-1749.patch,
LUCENE-1749.patch, LUCENE-1749.patch
> FieldCache should expose an Expert level API for runtime introspection of the FieldCache
to provide info about what is in the FieldCache at any given moment.  We should also provide
utility methods for sanity checking that the FieldCache doesn't contain anything "odd"...
>    * entries for the same reader/field with different types/parsers
>    * entries for the same field/type/parser in a reader and it's subreader(s)
>    * etc...

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message