lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bastian Spitzer" <bspit...@magix.net>
Subject AW: how to support "implicit trailing wildcards"
Date Mon, 09 Aug 2010 14:27:27 GMT
Wildcard-Search is already built in, just use:

?q=umoun*
?q=mounta*

-----Urspr√ľngliche Nachricht-----
Von: yandong yao [mailto:yydzero@gmail.com] 
Gesendet: Montag, 9. August 2010 15:57
An: solr-user@lucene.apache.org
Betreff: how to support "implicit trailing wildcards"

Hi everyone,


How to support 'implicit trailing wildcard *' using Solr, eg: using Google to search 'umoun',
'umount' will be matched , search 'mounta', 'mountain'
will be matched.

>From my point of view, there are several ways, both with disadvantages:

1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with 'u', 'um', 'umo', 'umou',
'umoun', 'umount'. The disadvantages are: a) the index size increases dramatically, b) will
matches even has no relationship, such as such 'mount' will match 'mountain' also.

2) Using two pass searching: first pass searches term dictionary through TermsComponent using
given keyword, then using the first matched term from term dictionary to search again. eg:
when user enter 'umoun', TermsComponent will match 'umount', then use 'umount' to search.
The disadvantage are: a) need to parse query string so that could recognize meta keywords
such as 'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP client), b)
The returned hit counts is not for original search string, thus will influence other components
such as auto-suggest component based on user search history and hit counts.

3) Write custom SearchComponent, while have no idea where/how to start with.

Is there any other way in Solr to do this, any feedback/suggestion are welcome!

Thanks very much in advance!

Mime
View raw message