lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Use of MultiFields.getFields() bad practice?
Date Wed, 05 Jan 2011 18:24:00 GMT
On Wed, Jan 5, 2011 at 12:35 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>> I do agree we should be returning null not EMPTY_DOCIDSET since
>> Filter.getDocIDSet's jdoc clearly states a null return means no docs are
>> accepted.
>
> I disagree. NULL is a stupid indicator, if something is empty it should
> return something empty. I also don't like somebody returning NULL instead of
> Collections.emptySet() or like that. We should document Filter and also
> Weight.scorer to always return non-null and also supply a
> Scorer.EMPTY_SCORER. If you want to optimize something, you can always
> change you loops to first do next() and then exit early.

I don't think it's so clear cut.  I think it depends on how the API is
used, and how advanced an API it is.

For example, Fields.terms("foobar") returns null if foobar is a
non-existent field.  I think that's appropriate, because to return a
"fake instance" loses information.  EG caller can no longer tell if a
given field exists or not.

In some cases the null return can make a difference in performance, eg
if BQ is OR'ing two terms, but one of them yields a null scorer
(matches no docs) then the scorer can [almost -- coord] rewrite to a
TermScorer with just the other term vs BooleanScorer.  We don't do
this today, but we should/could.

So my feeling is we have to take it case by case, and I think these
two cases (Weight.scorer and Filter.getDocIdSet) should keep their
current contract (null may be returned if no docs will match).

Regardless, I think each API should clearly state the contract
unambiguously.  Ie "this method never returns null", or, "this method
will return null if XYZ", or, "this method may return null if XYZ".

>> And I think I'm on the opposite side of the fence on null returns, at
> least for
>> advanced APIs -- I'd prefer not to hide information (by returning an empty
>> instance) in this case.  But other cases I think should never return null
> -- eg
>> once you have a non-null DocIdSet, then its .iterator() should never
> return
>> null.
>
> I agree. DocIdSet.iterator() should return EMPTY_DOCIDSETITERATOR.

Right, for an iterator from any class I think it should never return
null.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message