lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Sentence boundary storage
Date Mon, 31 Oct 2005 13:43:39 GMT
Inline below

Chris Hostetter wrote:

>: Actually, I was thinking of writing something along the lines of
>: Span*BoundaryQuery where it would be more explicit than what was
>: described below.  You could say SpanSentence and say you want the terms
>I'm not clear on how such a SpanSentence class would work -- the index
>must contain info about where sentence boundaries are, which means users
>would need a special analyzer/tokenizer to create Terms for those
>boundaries, and would need to tell the SpanSentence class what those
>tokens are.
Right, I was think providing it as a package of code that would store 
the tokens needed on indexing, etc, probably by extending the 
StandardTokenizer/Analyzer. The Span classes would need to take in the 
appropriate information and create the underlying SpanNotQuery, etc. as 
discussed in the previous email.

>It sounds like maybe you could write some convinience methods to construct
>the SpanQuery structure for you, but I don't see any practicle way to make
>a generic SpanSentence class.
>: codify what is discussed below into a few convenience Span queries, or
>: maybe we should just write it up better and put on the wiki or something...
>If you impliment it in an acctual application (instead of just theorizing
>it like Doug and I have done) then i definitely think I would make a
>usefull HOWTO if you hvae time to write one up...

I have added a fair amount to the current Lucene Demo for my ApacheCon 
talk in December, which will be available freely then that I might 
consider putting in a proof of concept/demo of how to do such a thing.  
I will try to write it up when I get the chance.

>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message