lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barbet Alain <alian123sol...@gmail.com>
Subject Re: Custom analyzer & frequency
Date Tue, 21 Nov 2017 15:55:21 GMT
You rock, thank you so much for this clear answer, I loose 2 days for
nothing as I've already the term freq but now I've an answer :-)
(And yes I check it's the doc freq, not the term freq).

Thank you again !

2017-11-21 16:34 GMT+01:00 Emir Arnautović <emir.arnautovic@sematext.com>:
> Hi Alain,
> As explained in prev mail that is doc frequency and each doc is counted once. I am not
sure if Luke can provide you information about overall term frequency - sum of term frequency
of all docs.
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 21 Nov 2017, at 16:30, Barbet Alain <alian123soleil@gmail.com> wrote:
>>
>> $ cat add_test.sh
>> DATA='
>> <add>
>>  <doc>
>>    <field name="docid">666</field>
>>    <field name="titi_txt_fr">toto titi tata toto tutu titi</field>
>>  </doc>
>> </add>
>> '
>> $ sh add_test.sh
>> <?xml version="1.0" encoding="UTF-8"?>
>> <response>
>> <lst name="responseHeader"><int name="status">0</int><int
>> name="QTime">484</int></lst>
>> </response>
>>
>>
>> $ curl 'http://localhost:8983/solr/alian_test/terms?terms.fl=titi_txt_fr&terms.sort=index'
>> <?xml version="1.0" encoding="UTF-8"?>
>> <response>
>> <lst name="responseHeader"><int name="status">0</int><int
>> name="QTime">0</int></lst><lst name="terms"><lst
>> name="titi_txt_fr"><int name="tata">1</int><int
>> name="titi">1</int><int name="toto">1</int><int
>> name="tutu">1</int></lst></lst>
>> </response>
>>
>>
>> So it's not only on Luke Side, it's come from Solr. Does it sound normal ?
>>
>> 2017-11-21 11:43 GMT+01:00 Barbet Alain <alian123soleil@gmail.com>:
>>> Hi,
>>>
>>> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
>>> I always get 1 as frequency for each word even if it's present
>>> multiple time in the text.
>>>
>>> So I try with default analyzer & find same behavior:
>>> My schema
>>>
>>>  <fieldType name="text_ami" class="solr.TextField">
>>>    <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>>>  </fieldType>
>>>  <field name="docid" type="string" indexed="true" required="true"
>>> stored="true"/>
>>>  <field name="test_text" type="nametext"/>
>>>
>>> alian@yoda:~/solr> cat add_test.sh
>>> DATA='
>>> <add>
>>>  <doc>
>>>    <field name="docid">666</field>
>>>    <field name="test_text">toto titi tata toto tutu titi</field>
>>>  </doc>
>>> </add>
>>> '
>>> curl -X POST -H 'Content-Type: text/xml'
>>> 'http://localhost:8983/solr/alian_test/update?commit=true'
>>> --data-binary "$DATA"
>>>
>>> When I test in solr interface / analyze, I find the right behavior
>>> (find titi & toto 2 times).
>>> But when I look in solr index with Luke or solr interface / schema,
>>> the top term always get 1 as frequency. Can someone give me the thing
>>> I forget ?
>>>
>>> (solr 6.5)
>>>
>>> Thank you !
>

Mime
View raw message