lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wunderw...@netflix.com>
Subject Re: token concat filter?
Date Thu, 01 May 2008 18:20:24 GMT
I doubt it would be that many. I recommend tracking the searches and
the clicks, and working on queries with low clickthrough.

Here are a few of mine from that sort of analysis:

ghost dog => ghost dog, ghostdog
ghost hunters => ghost hunters, ghosthunters
ghost rider => ghost rider, ghostrider
ghost world => ghost world, ghostworld
ghostbusters => ghostbusters, ghost busters

I don't see as many in personal names. Mostly, things like "De Niro"
and "DiCaprio".

wunder

On 5/1/08 11:13 AM, "Geoffrey Young" <geoff@modperlcookbook.org> wrote:
> Walter Underwood wrote:
>> I've been doing it with synonyms and I have several hundred of them.
> 
> I'm dealing mostly with proper names, so I expect more like 80k of them
> for our data :)
> 
>> Concatenating bi-word groups is pretty useful for English. We have a
>> habit of gluing words together. "database" used to be two words.
>> Dictionaries still think it should be "web server".
> 
> :)
> 
> --Geoff


Mime
View raw message