lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Zambrano <czamb...@gmail.com>
Subject Re: Question regarding synonym
Date Mon, 05 Oct 2009 02:33:28 GMT


On 10/02/2009 06:02 PM, darniz wrote:
> Thanks
> As i said it even works by giving double quotes too.
> like carDescription:"austin martin"
>
> So is that the conclusion that in order to map two word synonym i have to
> always enclose in double quotes, so that it doen not split the words
>
>
>
>    
Yes, but there are things you need to keep in mind.

 From the solr wiki:

Keep in mind that while the SynonymFilter will happily work with 
*synonyms* containing multiple words (ie: 
"sea biscuit, sea biscit, seabiscuit") The recommended approach for 
dealing with *synonyms* like this, is to expand the synonym when 
indexing. This is because there are two potential issues that can arrise 
at query time:

   1.

      The Lucene QueryParser tokenizes on white space before giving any
      text to the Analyzer, so if a person searches for the words
      sea biscit the analyzer will be given the words "sea" and "biscit"
      seperately, and will not know that they match a synonym.

   2.

      Phrase searching (ie: "sea biscit") will cause the QueryParser to
      pass the entire string to the analyzer, but if the SynonymFilter
      is configured to expand the *synonyms*, then when the QueryParser
      gets the resulting list of tokens back from the Analyzer, it will
      construct a MultiPhraseQuery that will not have the desired
      effect. This is because of the limited mechanism available for the
      Analyzer to indicate that two terms occupy the same position:
      there is no way to indicate that a "phrase" occupies the same
      position as a term. For our example the resulting MultiPhraseQuery
      would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
      not match the simple case of "seabisuit" occuring in a document


>
>
>
>
>
>
>
> Christian Zambrano wrote:
>    
>> When you use a field qualifier(fieldName:valueToLookFor) it only applies
>> to the word right after the semicolon. If you look at the debug
>> infomation you will notice that for the second word it is using the
>> default field.
>>
>> <str name="parsedquery_toString">carDescription:austin *text*:martin</str>
>>
>> the following should word:
>>
>> carDescription:(austin martin)
>>
>>
>> On 10/02/2009 05:46 PM, darniz wrote:
>>      
>>> This is not working when i search documents i have a document which
>>> contains
>>> text aston martin
>>>
>>> when i search carDescription:"austin martin" i get a match but when i
>>> dont
>>> give double quotes
>>>
>>> like carDescription:austin martin
>>> there is no match
>>>
>>> in the analyser if i give austin martin with out quotes, when it passes
>>> through synonym filter it matches aston martin ,
>>> may be by default analyser treats it as a phrase "austin martin" but when
>>> i
>>> try to do a query by typing
>>> carDescription:austin martin i get 0 documents. the following is the
>>> debug
>>> node info with debugQuery=on
>>>
>>> <str name="rawquerystring">carDescription:austin martin</str>
>>> <str name="querystring">carDescription:austin martin</str>
>>> <str name="parsedquery">carDescription:austin text:martin</str>
>>> <str name="parsedquery_toString">carDescription:austin text:martin</str>
>>>
>>> dont know why it breaks the word, may be its a desired behaviour
>>> when i give carDescription:"austin martin" of course in this its able to
>>> map
>>> to synonym and i get the desired result
>>>
>>> Any opinion
>>>
>>> darniz
>>>
>>>
>>>
>>> Ensdorf Ken wrote:
>>>
>>>        
>>>>
>>>>          
>>>>> Hi
>>>>> i have a question regarding synonymfilter
>>>>> i have a one way mapping defined
>>>>> austin martin, astonmartin =>   aston martin
>>>>>
>>>>>
>>>>>            
>>>> ...
>>>>
>>>>          
>>>>> Can anybody please explain if my observation is correct. This is a very
>>>>> critical aspect for my work.
>>>>>
>>>>>            
>>>> That is correct - the synonym filter can recognize multi-token synonyms
>>>> from consecutive tokens in a stream.
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>
>>>        
>>
>>      
>    

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message