lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Custom analyzer & frequency
Date Tue, 21 Nov 2017 15:34:25 GMT
Hi Alain,
As explained in prev mail that is doc frequency and each doc is counted once. I am not sure
if Luke can provide you information about overall term frequency - sum of term frequency of
all docs.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Nov 2017, at 16:30, Barbet Alain <alian123soleil@gmail.com> wrote:
> 
> $ cat add_test.sh
> DATA='
> <add>
>  <doc>
>    <field name="docid">666</field>
>    <field name="titi_txt_fr">toto titi tata toto tutu titi</field>
>  </doc>
> </add>
> '
> $ sh add_test.sh
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">484</int></lst>
> </response>
> 
> 
> $ curl 'http://localhost:8983/solr/alian_test/terms?terms.fl=titi_txt_fr&terms.sort=index'
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int
> name="QTime">0</int></lst><lst name="terms"><lst
> name="titi_txt_fr"><int name="tata">1</int><int
> name="titi">1</int><int name="toto">1</int><int
> name="tutu">1</int></lst></lst>
> </response>
> 
> 
> So it's not only on Luke Side, it's come from Solr. Does it sound normal ?
> 
> 2017-11-21 11:43 GMT+01:00 Barbet Alain <alian123soleil@gmail.com>:
>> Hi,
>> 
>> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
>> I always get 1 as frequency for each word even if it's present
>> multiple time in the text.
>> 
>> So I try with default analyzer & find same behavior:
>> My schema
>> 
>>  <fieldType name="text_ami" class="solr.TextField">
>>    <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>>  </fieldType>
>>  <field name="docid" type="string" indexed="true" required="true"
>> stored="true"/>
>>  <field name="test_text" type="nametext"/>
>> 
>> alian@yoda:~/solr> cat add_test.sh
>> DATA='
>> <add>
>>  <doc>
>>    <field name="docid">666</field>
>>    <field name="test_text">toto titi tata toto tutu titi</field>
>>  </doc>
>> </add>
>> '
>> curl -X POST -H 'Content-Type: text/xml'
>> 'http://localhost:8983/solr/alian_test/update?commit=true'
>> --data-binary "$DATA"
>> 
>> When I test in solr interface / analyze, I find the right behavior
>> (find titi & toto 2 times).
>> But when I look in solr index with Luke or solr interface / schema,
>> the top term always get 1 as frequency. Can someone give me the thing
>> I forget ?
>> 
>> (solr 6.5)
>> 
>> Thank you !


Mime
View raw message