lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wunderw...@netflix.com>
Subject Re: Searchproblem composite words
Date Thu, 03 May 2007 16:44:32 GMT
A agree that multi-word synonyms are an excellent way to do this.

This may sound like a hack, but you'd end up doing this even if
you had dedicated linguistic compound decomposition software.
Those usually use a dictionary of common words and the dictionary
rarely has all the words that are important for your site.

I'll be doing this for my site to handle things like "dreamgirls"
and "dream girls".

wunder

On 5/2/07 11:58 AM, "Chris Hostetter" <hossman_lucene@fucit.org> wrote:

> 
> : For example I have the composite word "wishlist" in my document. I can
> : easily find the document by using the search string "wishlist" or "wish*"
> : but I don't get any result with "list".
> 
> what you are describing is basically a substring search problem ...
> sometimes this can be dealt with by using something like the
> WordDeliminterFilter -- but only if people are using "WishList" in their
> documents.
> 
> Another approach would be to use and NGram based tokenizer (built in
> support for this will probably be added soon) but then searches for things
> like "able" will match words like "cable" ... which may not be what you
> want (yes it is a substring, but it is not what anyone would consider a
> "composite word"
> 
> the best way to match what you want extremely acurately would be to use
> the SynonymFilter and enumerate every composite word you care about in the
> Synonym list ... tedious yes, but also very accurate.
> 
> -Hoss



Mime
View raw message