lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: When searching for !@#$%^&*() all documents are matched incorrectly
Date Mon, 01 Jun 2009 14:33:50 GMT
OK, here's the deal:

<str name="rawquerystring">-features:foo features:(\!@#$%\^&\*\(\))</str>
<str name="querystring">-features:foo features:(\!@#$%\^&\*\(\))</str>
<str name="parsedquery">-features:foo</str>
<str name="parsedquery_toString">-features:foo</str>

The text analysis is throwing away non alphanumeric chars (probably
the WordDelimiterFilter).  The Lucene (and Solr) query parser throws
away term queries when the token is zero length (after analysis).
Solr then interprets the left over "-features:foo" as "all documents
not containing foo in the features field", so you get a bunch of
matches.

-Yonik
http://www.lucidimagination.com


On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels <masu69@yahoo.com> wrote:
>
> Walter,
>
> The analysis link does not produce any matches for either @ or !@#$%^&*()
> strings when I try to match against bathing. I'm worried that this might be
> the symptom of another problem (which has not revealed itself yet) and want
> to get to the bottom of this...
>
> Thank you.
> sm
>
>
> Walter Underwood wrote:
>>
>> Use the [analysis] link on the Solr admin UI to get more info on
>> how this is being interpreted.
>>
>> However, I am curious about why this is important. Do users enter
>> this query often? If not, maybe it is not something to spend time on.
>>
>> wunder
>>
>> On 5/31/09 2:56 PM, "Sam Michaels" <masu69@yahoo.com> wrote:
>>
>>>
>>> Here is the output from the debug query when I'm trying to match the
>>> String @
>>> against Bathing (should not match)
>>>
>>> <str name="GLOM-1">
>>> 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
>>>   0.99999994 = queryWeight(activity_type:NAME), product of:
>>>     3.2689075 = idf(docFreq=153, numDocs=1489)
>>>     0.30591258 = queryNorm
>>>   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of:
>>>     1.0 = tf(termFreq(activity_type:NAME)=1)
>>>     3.2689075 = idf(docFreq=153, numDocs=1489)
>>>     1.0 = fieldNorm(field=activity_type, doc=0)
>>> </str>
>>>
>>> Looks like the AND clause in the search string is ignored...
>>>
>>> SM.
>>>
>>>
>>> ryantxu wrote:
>>>>
>>>> two key things to try (for anyone ever wondering why a query matches
>>>> documents)
>>>>
>>>> 1.  add &debugQuery=true and look at the explain text below --
>>>> anything that contributed to the score is listed there
>>>> 2.  check /admin/analysis.jsp -- this will let you see how analyzers
>>>> break text up into tokens.
>>>>
>>>> Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
>>>> something to do with it...
>>>>
>>>>
>>>> On Sat, May 30, 2009 at 5:59 PM, Sam Michaels <masu69@yahoo.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm running Solr 1.3/Java 1.6.
>>>>>
>>>>> When I run a query like  - (activity_type:NAME) AND
>>>>> title:(\!@#$%\^&\*\(\))
>>>>> all the documents are returned even though there is not a single match.
>>>>> There is no title that matches the string (which has been escaped).
>>>>>
>>>>> My document structure is as follows
>>>>>
>>>>> <doc>
>>>>> <str name="activity_type">NAME</str>
>>>>> <str name="title">Bathing</str>
>>>>> ....
>>>>> </doc>
>>>>>
>>>>>
>>>>> The title field is of type text_title which is described below.
>>>>>
>>>>> <fieldType name="text_title" class="solr.TextField"
>>>>> positionIncrementGap="100">
>>>>>      <analyzer type="index">
>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>        <!-- in this example, we will only use synonyms at query
time
>>>>>        <filter class="solr.SynonymFilterFactory"
>>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>>>        -->
>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>>      </analyzer>
>>>>>      <analyzer type="query">
>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>        <filter class="solr.SynonymFilterFactory"
>>>>> synonyms="synonyms.txt"
>>>>> ignoreCase="true" expand="true"/>
>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>>
>>>>>      </analyzer>
>>>>>    </fieldType>
>>>>>
>>>>> When I run the query against Luke, no results are returned. Any
>>>>> suggestions
>>>>> are appreciated.
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document
>>>>> s-are-matched-incorrectly-tp23797731p23797731.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Mime
View raw message