lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Zambrano <czamb...@gmail.com>
Subject Re: Question regarding synonym
Date Mon, 05 Oct 2009 17:10:46 GMT
You are correct.

I would recommend to only use the Synonym TokenFilter at index time 
unless you have a very good reason to do it at query time.

On 10/05/2009 11:46 AM, darniz wrote:
> yes that's what we decided to expand these terms while indexing.
> if we have
> bayrische motoren werke =>  bmw
>
> and i have a document which has bmw in it, searching for text:bayrische does
> not give me results. i have to give
> text:"bayrische motoren werke" then it actually takes the synonym and gets
> me the document.
>
> Now if i change the synonym mapping to
> bayrische motoren werke , bmw with expand parameter to true and also use
> this file at indexing.
>
> now at the  time i index this document along with "bmw" i also index the
> following words "bayrische" "motoren" "werke"
>
> any text query like text:motoren or text:bayrische will give me results now.
>
> Please correct me if my assumption is wrong.
>
> Thanks
> darniz
>
>
>
>
>
>
>
>
>
> Christian Zambrano wrote:
>    
>>
>>
>> On 10/02/2009 06:02 PM, darniz wrote:
>>      
>>> Thanks
>>> As i said it even works by giving double quotes too.
>>> like carDescription:"austin martin"
>>>
>>> So is that the conclusion that in order to map two word synonym i have to
>>> always enclose in double quotes, so that it doen not split the words
>>>
>>>
>>>
>>>
>>>        
>> Yes, but there are things you need to keep in mind.
>>
>>   From the solr wiki:
>>
>> Keep in mind that while the SynonymFilter will happily work with
>> *synonyms* containing multiple words (ie:
>> "sea biscuit, sea biscit, seabiscuit") The recommended approach for
>> dealing with *synonyms* like this, is to expand the synonym when
>> indexing. This is because there are two potential issues that can arrise
>> at query time:
>>
>>     1.
>>
>>        The Lucene QueryParser tokenizes on white space before giving any
>>        text to the Analyzer, so if a person searches for the words
>>        sea biscit the analyzer will be given the words "sea" and "biscit"
>>        seperately, and will not know that they match a synonym.
>>
>>     2.
>>
>>        Phrase searching (ie: "sea biscit") will cause the QueryParser to
>>        pass the entire string to the analyzer, but if the SynonymFilter
>>        is configured to expand the *synonyms*, then when the QueryParser
>>        gets the resulting list of tokens back from the Analyzer, it will
>>        construct a MultiPhraseQuery that will not have the desired
>>        effect. This is because of the limited mechanism available for the
>>        Analyzer to indicate that two terms occupy the same position:
>>        there is no way to indicate that a "phrase" occupies the same
>>        position as a term. For our example the resulting MultiPhraseQuery
>>        would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
>>        not match the simple case of "seabisuit" occuring in a document
>>
>>
>>      
>>>
>>>
>>>
>>>
>>>
>>>
>>> Christian Zambrano wrote:
>>>
>>>        
>>>> When you use a field qualifier(fieldName:valueToLookFor) it only applies
>>>> to the word right after the semicolon. If you look at the debug
>>>> infomation you will notice that for the second word it is using the
>>>> default field.
>>>>
>>>> <str name="parsedquery_toString">carDescription:austin
>>>> *text*:martin</str>
>>>>
>>>> the following should word:
>>>>
>>>> carDescription:(austin martin)
>>>>
>>>>
>>>> On 10/02/2009 05:46 PM, darniz wrote:
>>>>
>>>>          
>>>>> This is not working when i search documents i have a document which
>>>>> contains
>>>>> text aston martin
>>>>>
>>>>> when i search carDescription:"austin martin" i get a match but when i
>>>>> dont
>>>>> give double quotes
>>>>>
>>>>> like carDescription:austin martin
>>>>> there is no match
>>>>>
>>>>> in the analyser if i give austin martin with out quotes, when it passes
>>>>> through synonym filter it matches aston martin ,
>>>>> may be by default analyser treats it as a phrase "austin martin" but
>>>>> when
>>>>> i
>>>>> try to do a query by typing
>>>>> carDescription:austin martin i get 0 documents. the following is the
>>>>> debug
>>>>> node info with debugQuery=on
>>>>>
>>>>> <str name="rawquerystring">carDescription:austin martin</str>
>>>>> <str name="querystring">carDescription:austin martin</str>
>>>>> <str name="parsedquery">carDescription:austin text:martin</str>
>>>>> <str name="parsedquery_toString">carDescription:austin
>>>>> text:martin</str>
>>>>>
>>>>> dont know why it breaks the word, may be its a desired behaviour
>>>>> when i give carDescription:"austin martin" of course in this its able
>>>>> to
>>>>> map
>>>>> to synonym and i get the desired result
>>>>>
>>>>> Any opinion
>>>>>
>>>>> darniz
>>>>>
>>>>>
>>>>>
>>>>> Ensdorf Ken wrote:
>>>>>
>>>>>
>>>>>            
>>>>>>
>>>>>>              
>>>>>>> Hi
>>>>>>> i have a question regarding synonymfilter
>>>>>>> i have a one way mapping defined
>>>>>>> austin martin, astonmartin =>    aston martin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> ...
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Can anybody please explain if my observation is correct. This
is a
>>>>>>> very
>>>>>>> critical aspect for my work.
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> That is correct - the synonym filter can recognize multi-token
>>>>>> synonyms
>>>>>> from consecutive tokens in a stream.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>>
>>      
>    

Mime
View raw message