lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mailing Lists Account" <>
Subject Correlating matched terms with Document
Date Tue, 21 Jan 2003 07:29:10 GMT

I have a strange requirement. I am indexing a single HTML Document and
searching it immediately for one or more keywords (Boolean/Phrase query). 
When the keywords are found in the document, I would like to
know if the matched keywords are from hyperlink text, a paragraph or one of
<h1>, <h2> etc tags.  

a) I cannot add multiple fields as I need to do "Phrase" query.

b) During the tokenization, I know exactly if a particular token is from a
specific tag. Can this be stored in
the index as some user-defined flags or something like that and later
retrieve it. Looking at the API, it doesn't seem to be possible.
I see that I can associate token type (such as "word", "eol" ) with the
analyzer token, but this is not stored in the index.

c) One option seems to be to re-tokenize the document after search - like
some of the highlight summary examples are doing.  Then
I can match the document tokens with the terms.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message