lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject Re: anti-words - exact match
Date Fri, 06 Aug 2010 00:14:57 GMT
This is tricky. You could try doing something with the ShingleFilter 
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory) 
at _query time_ to turn the users query:

"i have a swollen foot" into:
"i", "i have", "i have a", "i have a swollen", .... "have", "have a", 
"have a swollen"... etc.

I _think_ you can get the ShingleFilter factory to do that.

But now you only want to exclude if one of those shingles matches the 
ENTIRE "anti-word". So maybe index as non-tokenized, so each of those 
shingles will somehow only match on the complete thing.  You'd want to 
normalize spacing and punctuation.

But then you need to turn that into a _negated_ element of your query. 
Perhaps by using an fq with a NOT/"-" in it? And a query which 'matches' 
(causing 'not' behavior) if _any_ of the shingles match.

I have no idea if it's actually possible to put these things together in 
that way. A non-tokenized field? Which still has it's queries 
shingle-ized at query time? And then works as a negated query, matching 
for negation if any of the shingles match?  Not really sure how to put 
that together in your solrconfig.xml and/or application logic if needed. 
You could try.

Another option would be doing the query-time 'shingling' in your app, 
and then it's a somewhat more normal Solr query. &fq= -"shingle one" 
-"shingle two" -"shingle three" etc.  Or put em in seperate fq's 
depending on how you want to use your filter cache. Still searching on a 
non-tokenized field, and still normalizing on white-space and 
punctuation at both index time and (using same normalization logic but 
in your application logic this time) query time.  I think that might work.

So I'm not really sure, but maybe that gives you some ideas.

Jonathan



Satish Kumar wrote:
> Hi,
>
> We have a requirement to NOT display search results if user query contains
> terms that are in our anti-words field. For example, if user query is "I
> have swollen foot" and if some records in our index have "swollen foot" in
> anti-words field, we don't want to display those records. How do I go about
> implementing this?
>
> NOTE 1: anti-words field can contain multiple values. Each value can be a
> one or multiple words (e.g. "swollen foot", "headache", etc. )
>
> NOTE 2: the match must be exact. If anti-words field contains "swollen foot"
> and if user query is "I have swollen foot", record must be excluded. If user
> query is "My foot is swollen", the record should not be excluded.
>
> Any pointers is greatly appreciated!
>
>
> Thanks,
> Satish
>
>   

Mime
View raw message