lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject Re: Advice on Exact Matching?
Date Tue, 04 Jan 2011 21:50:36 GMT
There is a hacky kind of thing that Bill Dueber figured out for using 
multiple fields and dismax to BOOST "exact" matches, but include all 
matches in the result set.

You have to duplicate your data in a second non-tokenized field. Then 
you use dismax pf to super boost matches on the non-tokenized field. 
Because 'pf' is a phrase search, you don't run into trouble with dismax 
"pre-tokenization" on white space, even though it's a field that might 
have internal-token whitespace. (Using a non-tokenized field with dismax 
qf will basically never match a result with whitespace, unless it's 
phrase-quoted in query. But pf works.).

Because it was a non-tokenized field, it only matches (and triggers the 
dismax ps super boost) if it's an exact match. And it works. You CAN 
normalize your 'exact match' field in field analysis, removing 
punctuation or normalizing whitespace or whatever, and that works too, 
doing it both at index and query time analysis.



On 1/4/2011 4:28 PM, Chris Hostetter wrote:
> : I am trying to make sure that when I search for text—regardless of
> : what that text is—that I get an exact match.  I'm *still* getting some
> : issues, and this last mile is becoming very painful.  The solr field,
> : for which I'm setting this up on, is pasted below my explanation.  I
> : appreciate any help.
>
> if you are using a TextField with some analysis components, it's
> virtually impossible to get "exact" matches -- where my definition of
> exact is that the query text is character for character identical to the
> entire field value indexed.
>
> is your definition of exact match different?  i assme it must be since you
> are using TextField and talk about wanting to deal with whitespace between
> words.  so i think you need to explain a little bit better what your
> indexed data looks like, and what sample queries you expect to match that
> data (and equally important: what queries should *not* match thta data,
> and what data should *not* match those queries)
>
> : If I want to find *all* Solr documents that match
> : "[id]somejunk\hi[/id]" then life is instantly hell.
>
> 90% of the time when people have problems with "exact" matches it's
> because of QueryParser meta characters -- characters like ":", "[" and
> whitespace that the QUeryParser uses as instructions.  you can use the
> "raw" QParser to have every character treated as a literal....
>
> 	defType=raw
> 	q=[id]somejunk\hi[/id]
>
> -Hoss
>

Mime
View raw message