lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl / Cominvent <jan....@cominvent.com>
Subject Re: how to support "implicit trailing wildcards"
Date Tue, 10 Aug 2010 13:09:37 GMT
Hi,

You don't need to duplicate the content into two fields to achieve this. Try this:

q=mount OR mount*

The exact match will always get higher score than the wildcard match because wildcard matches
uses "constant score".

Making this work for multi term queries is a bit trickier, but something along these lines:

q=(mount OR mount*) AND (everest OR everest*)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 10. aug. 2010, at 09.38, Geert-Jan Brits wrote:

> you could satisfy this by making 2 fields:
> 1. exactmatch
> 2. wildcardmatch
> 
> use copyfield in your schema to copy 1 --> 2 .
> 
> q=exactmatch:mount+wildcardmatch:mount*&q.op=OR
> this would score exact matches above (solely) wildcard matches
> 
> Geert-Jan
> 
> 2010/8/10 yandong yao <yydzero@gmail.com>
> 
>> Hi Bastian,
>> 
>> Sorry for not make it clear, I also want exact match have higher score than
>> wildcard match, that is means: if searching 'mount', documents with 'mount'
>> will have higher score than documents with 'mountain', while 'mount*' seems
>> treat 'mount' and 'mountain' as same.
>> 
>> besides, also want the query to be processed with analyzer, while from
>> 
>> http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F
>> ,
>> Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer.
>> The
>> rationale is that if search 'mounted', I also want documents with 'mount'
>> match.
>> 
>> So seems built-in wildcard search could not satisfy my requirements if i
>> understand correctly.
>> 
>> Thanks very much!
>> 
>> 
>> 2010/8/9 Bastian Spitzer <bspitzer@magix.net>
>> 
>>> Wildcard-Search is already built in, just use:
>>> 
>>> ?q=umoun*
>>> ?q=mounta*
>>> 
>>> -----Ursprüngliche Nachricht-----
>>> Von: yandong yao [mailto:yydzero@gmail.com]
>>> Gesendet: Montag, 9. August 2010 15:57
>>> An: solr-user@lucene.apache.org
>>> Betreff: how to support "implicit trailing wildcards"
>>> 
>>> Hi everyone,
>>> 
>>> 
>>> How to support 'implicit trailing wildcard *' using Solr, eg: using
>> Google
>>> to search 'umoun', 'umount' will be matched , search 'mounta', 'mountain'
>>> will be matched.
>>> 
>>> From my point of view, there are several ways, both with disadvantages:
>>> 
>>> 1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with 'u',
>>> 'um', 'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the
>> index
>>> size increases dramatically, b) will matches even has no relationship,
>> such
>>> as such 'mount' will match 'mountain' also.
>>> 
>>> 2) Using two pass searching: first pass searches term dictionary through
>>> TermsComponent using given keyword, then using the first matched term
>> from
>>> term dictionary to search again. eg: when user enter 'umoun',
>> TermsComponent
>>> will match 'umount', then use 'umount' to search. The disadvantage are:
>> a)
>>> need to parse query string so that could recognize meta keywords such as
>>> 'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP
>>> client), b) The returned hit counts is not for original search string,
>> thus
>>> will influence other components such as auto-suggest component based on
>> user
>>> search history and hit counts.
>>> 
>>> 3) Write custom SearchComponent, while have no idea where/how to start
>>> with.
>>> 
>>> Is there any other way in Solr to do this, any feedback/suggestion are
>>> welcome!
>>> 
>>> Thanks very much in advance!
>>> 
>> 


Mime
View raw message