lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: SynonymGraphFilter
Date Thu, 13 Sep 2018 13:33:57 GMT
Thanks Michael. I think this clears my questions.

Best regards


On 9/12/18 8:23 PM, Michael Sokolov wrote:
> Usually one will either apply synonyms at index time or apply them at query
> time, but not both. I think the situation is that you will get most correct
> behavior, respecting synonym graph structure, with query time synonyms.
>
> Index time synonyms may give better performance, but at the cost of some
> overlap along time positions that results from the need for flattening, as
> in the quote you provided. If you use only query time synonyms there is no
> need to flatten.
>
> On Thu, Sep 13, 2018, 12:59 AM <baris.kazar@oracle.com> wrote:
>
>> Any examples on the following note on the Javadocs at
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_lucene_analysis_synonym_SynonymGraphFilter.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=jjVzb2BqmqJ8noR0AT4fAenDR5scVDEiq9sAcfDmSjM&s=S02bxwhpCKvLzibdipBlbNQUEcnYsXVBBIiOV2fUKNM&e=
>>
>>
>> Quoted from the above url:
>>
>> */However, if you use this during indexing, you must follow it with
>> FlattenGraphFilter to squash tokens on top of one another like
>> SynonymFilter, because the indexer can't directly consume a graph. To
>> get fully correct positional queries when your synonym replacements are
>> multiple tokens, you should instead apply synonyms using this
>> TokenFilter at query time and translate the resulting graph to a
>> TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./*
>>
>> End of quote
>>
>>
>> This will make the code really hard to maintain if we separate synonyms
>> based on the number of tokens.
>>
>> Any suggestions please?
>>
>> Best regards
>>
>>
>>
>>
>> On 9/11/18 1:45 PM, baris.kazar@oracle.com wrote:
>>> Mike,-
>>>
>>> Great article, thanks for that; and i was exactly thinking about
>>> reverse mapping when
>>>
>>> i was writing this question. i guess Lucene would be nicer to both
>>> mappings when one is called for or another parameter to activate this
>>> double mapping.
>>>
>>>
>>> My next question is: can a synonmy be separated by space ?
>>>
>>> Next last question on this: should i repeat this both at index and
>>> query times?
>>> Best regards
>>>
>>> On 9/11/18 1:39 PM, Michael McCandless wrote:
>>>> Try reading the blog post I wrote about token stream graphs?
>>>>
>>>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU&e=
>>>>
>>>> Mike McCandless
>>>>
>>>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo&e=
>>>>
>>>> On Tue, Sep 11, 2018 at 1:35 PM, <baris.kazar@oracle.com> wrote:
>>>>
>>>>> Any comments please?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
>>>>>
>>>>>> Any examples on this? i think it would be nice if Javadocs had an
>>>>>> example
>>>>>> on this:
>>>>>>
>>>>>> However, if you use this during indexing, you must follow it with
>>>>>> FlattenGraphFilter to squash tokens on top of one another like
>>>>>> SynonymFilter, because the indexer can't directly consume a graph.
>>>>>> To get
>>>>>> fully correct positional queries when your synonym replacements are
>>>>>> multiple tokens, you should instead apply synonyms using this
>>>>>> TokenFilter
>>>>>> at query time and translate the resulting graph to a
>>>>>> TermAutomatonQuery
>>>>>> e.g. using TokenStreamToTermAutomatonQuery.
>>>>>>
>>>>>> multiple tokens means: a synonym with multiple equivalents??
>>>>>>
>>>>>> or does it mean a synonym with multiple words?
>>>>>>
>>>>>> this is not clear to me.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.
>>>>>>> apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce
>>>>>>> ne_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1Y
>>>>>>> umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BK
>>>>>>> NeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa
>>>>>>> YV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e=
>>>>>>>
>>>>>>> Does this mean i dont have to repeat it in the search analyzer
>>>>>>> when i do
>>>>>>> this at indexing time?
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message