lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Use of MultiFields.getFields() bad practice?
Date Wed, 05 Jan 2011 17:35:36 GMT
> I do agree we should be returning null not EMPTY_DOCIDSET since
> Filter.getDocIDSet's jdoc clearly states a null return means no docs are
> accepted.

I disagree. NULL is a stupid indicator, if something is empty it should
return something empty. I also don't like somebody returning NULL instead of
Collections.emptySet() or like that. We should document Filter and also
Weight.scorer to always return non-null and also supply a
Scorer.EMPTY_SCORER. If you want to optimize something, you can always
change you loops to first do next() and then exit early.

> And I think I'm on the opposite side of the fence on null returns, at
least for
> advanced APIs -- I'd prefer not to hide information (by returning an empty
> instance) in this case.  But other cases I think should never return null
-- eg
> once you have a non-null DocIdSet, then its .iterator() should never
return
> null.

I agree. DocIdSet.iterator() should return EMPTY_DOCIDSETITERATOR.

> Mike
> 
> On Tue, Jan 4, 2011 at 5:38 PM, Smiley, David W. <dsmiley@mitre.org>
> wrote:
> > I'm looking through the trunk code on various implementations of
> Filter.getDocIdSet(IndexReader).  It is often needed to get an instance of
> Terms and then do other work from there.  Looking at
> MultiTermQueryWrapperFilter, the first set of lines to do this is:
> >
> >  public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
> >    final Fields fields = MultiFields.getFields(reader);
> >    if (fields == null) {
> >      // reader has no fields
> >      return DocIdSet.EMPTY_DOCIDSET;
> >    }
> >
> >    final Terms terms = fields.terms(query.field);
> >    if (terms == null) {
> >      // field does not exist
> >      return DocIdSet.EMPTY_DOCIDSET;
> >    }
> > ....
> >
> > When I look at the javadoc for MultiFields.getFields(reader), I see some
> Javadoc (apparently written by Michael McCandless, CC'ed), with the
> following javadoc snippet :
> >   *  <p><b>NOTE</b>: this is a slow way to access postings.
> >   *  It's better to get the sub-readers (using {@link
> >   *  Gather}) and iterate through them
> >   *  yourself.
> >
> > If this is the case, then why is MultiFields.getFields(reader) used 43
times
> across Lucene/Solr whereas ReaderUtil.Gather is only used 5 times?  If
it's a
> TODO then perhaps a JIRA issue needs to be created.  I don't find helpful
> examples of how to use ReaderUtil.Gather... the existing 5 uses are all
within
> MultiFields & ReaderUtil.
> >
> > FWIW, in a Lucene Filter I wrote, I've been using this code snippet
> successfully:
> >
> >        Terms terms = reader.fields().terms(fieldName);
> >
> > On a related topic, I think that if Filter.getDocIdSet() is documented
that it
> may return null, then it's better code design to consequently return null
in
> appropriate circumstances instead of DocIdSet.EMPTY_DOCIDSET.  That said,
> FWIW, I prefer API design that favors non-null when you can get away with
> it, like this case.  So I'm in favor of making getDocIdSet() be documented
to
> not return null (and follow through throughout the codebase).  Admittedly
> some callers might have short-circuit logic.
> >
> > ~ David Smiley
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message