lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Multiword Highlighting
Date Fri, 16 Feb 2007 17:20:40 GMT
> 1> my test cases throw some exceptions with the code as-is. The 
> spans.get(0)
> is a problem in that it's not guaranteed that the spans returned will 
> have
> anything in them. Also, I don't think that the test for 
> reqSpans.get(0).next
> in queryClauses[i].isRequired is correct (even if it doesn't throw
> exceptions). Isn't the sense there that we want to include the spans 
> if we
> *do* have entries??

You should get a Spans object back no matter what, hence the .get(0). If 
the Spans object returned has no Spans in it, then the first call to 
next will return false.
> 2> But more importantly, I think this throws things in the "span bucket"
> across documents. Consicer two documents with text "a b c d e f" is in 
> one
> document, and "x y z" is in another, and we query on "a AND z", it seems
> like extractSpansFromTermQuery would return one span from each document,
> which would satisfy the tests in getSpansFromBooleanQuery 
> inappropriately.
This might be the case. I have not considered it...I am working with 
real hit highlighting and so I only work with a single document at a time.
> Is it just me or is working with Spans really intended to be "one pass
> through and only forward"? There are several places in the SpansExtractor
> code where we want to ask "are there any spans in here?". But to ask 
> that,
> you have to call next(). Which changes the state of the Spans such 
> that you
> have to be really careful when you use any Spans that have had this test
> performed already and do a do..while (; rather than a 
> while (
> {}..... Ditto with skipTo.
Could be. I'll do some testing.

I haven't thrown any exceptions yet, but I am working with a single doc 
in a memoryindex. So far I have yet to see a problem. I will keep looking.

- Mark

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message