lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua O'Madadhain" <>
Subject Re: Indexing synonyms
Date Sun, 10 Nov 2002 18:23:04 GMT
On Sun, 10 Nov 2002, Aaron Galea wrote:

> I need to create a filter that extends a tokenfilter whose purpose is
> to generate some synonyms for words in the document using Wordnet.
> Well searching for synonyms using wordnet is not that problematic but
> I need to add the synonym words to Lucene tokenstream before they are
> passed for indexing. However TokenStream class does not support any
> add method. Did anyone ever needed to do this? Can someone suggest an
> alternative of how to add some synonym words to the index?

I have done some related work with Lucene involving query expansion
(although not with Wordnet), but it's not clear to me what you're trying
to accomplish with synonyms.  It sounds to me as though you want to store
all the synonyms of a word with the word in the index, in such a way that 
if a query involves any of them, they all respond.

If you want to set it up so that all documents that contain synonyms of at
least 1 query term to have positive scores, then it may be easier to do
term expansion via post-processing of the query: at query time, get
whatever related words you want (e.g., all synonyms of each term) from
Wordnet, and add them to the query.  I would guess that this would be fast
enough for your purposes, is more flexible (in case you want to expand or
contract your notion of a synonym), and requires no additional index

If you have something else in mind, what is it?


Joshua O'Madadhain

(Incidentally, I'd be interested to hear--off-list--of your experiences
with using Wordnet.) Per
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message