lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Sentence and Paragraph searching
Date Fri, 01 Jul 2005 19:27:02 GMT
On Friday 01 July 2005 21:13, Dave Kor wrote:
> Quoting Peter Laurinc <>:
> > Hi,
> >
> > I'm newbie to lucene.
> > I wan to ask, how to implement search for phrase that must be in
> > sentence/paragraph.
> > I did see som examples, that uses term position changing, but I think
> > that this is not the way, because it breaks classic proximity search.
> > (if one word is on end and second of begining of next sentence)
> Most NLP toolkits have a sentence and paragraph boundary detector that can be
> used to separate a single document into its constituent sentences and
> paragraphs. The two NLP toolkits I am familiar with are OpenNLP (Open Source)
> and Alias-i's LingPipe (Commercial) libraries. These toolkits can be used in
> several ways to achieve what you want.
> If you are ONLY interested in searching for individual sentences, then you can
> use the toolkits to create an index of sentences instead of an index of
> documents.
> Alternatively, you can encode the sentence boundaries found by these toolkits
> within the documents you are indexing, for example using special characters or
> as a separate field in Lucene. After every search, do an extra check to ensure
> that Lucene did not match across sentence boundaries.

Or try and use a SpanNotQuery to make sure that the sentence or paragraph
border is contained in matches in the same field.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message