lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Use of MultiFields.getFields() bad practice?
Date Wed, 05 Jan 2011 17:23:47 GMT
I just cleaned up a few more unnecessary ones.

Really, we need to break out atomic vs composite IndexReaders.
Effectively, we already have, it's just that it's dynamically typed
(you hit exc's at runtime) not statically typed.  I'd like to make it
statically typed so it's clear which APIs take what readers.  EG
Filter.getDocIDSet always takes an atomic reader.

I do agree we should be returning null not EMPTY_DOCIDSET since
Filter.getDocIDSet's jdoc clearly states a null return means no docs
are accepted.

And I think I'm on the opposite side of the fence on null returns, at
least for advanced APIs -- I'd prefer not to hide information (by
returning an empty instance) in this case.  But other cases I think
should never return null -- eg once you have a non-null DocIdSet, then
its .iterator() should never return null.

Mike

On Tue, Jan 4, 2011 at 5:38 PM, Smiley, David W. <dsmiley@mitre.org> wrote:
> I'm looking through the trunk code on various implementations of Filter.getDocIdSet(IndexReader).
 It is often needed to get an instance of Terms and then do other work from there.  Looking
at MultiTermQueryWrapperFilter, the first set of lines to do this is:
>
>  public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
>    final Fields fields = MultiFields.getFields(reader);
>    if (fields == null) {
>      // reader has no fields
>      return DocIdSet.EMPTY_DOCIDSET;
>    }
>
>    final Terms terms = fields.terms(query.field);
>    if (terms == null) {
>      // field does not exist
>      return DocIdSet.EMPTY_DOCIDSET;
>    }
> ....
>
> When I look at the javadoc for MultiFields.getFields(reader), I see some Javadoc (apparently
written by Michael McCandless, CC'ed), with the following javadoc snippet :
>   *  <p><b>NOTE</b>: this is a slow way to access postings.
>   *  It's better to get the sub-readers (using {@link
>   *  Gather}) and iterate through them
>   *  yourself.
>
> If this is the case, then why is MultiFields.getFields(reader) used 43 times across Lucene/Solr
whereas ReaderUtil.Gather is only used 5 times?  If it's a TODO then perhaps a JIRA issue
needs to be created.  I don't find helpful examples of how to use ReaderUtil.Gather... the
existing 5 uses are all within MultiFields & ReaderUtil.
>
> FWIW, in a Lucene Filter I wrote, I've been using this code snippet successfully:
>
>        Terms terms = reader.fields().terms(fieldName);
>
> On a related topic, I think that if Filter.getDocIdSet() is documented that it may return
null, then it's better code design to consequently return null in appropriate circumstances
instead of DocIdSet.EMPTY_DOCIDSET.  That said, FWIW, I prefer API design that favors non-null
when you can get away with it, like this case.  So I'm in favor of making getDocIdSet() be
documented to not return null (and follow through throughout the codebase).  Admittedly some
callers might have short-circuit logic.
>
> ~ David Smiley

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message