lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Custom analyzer & frequency
Date Tue, 21 Nov 2017 15:30:47 GMT
Hi Alain,
I haven’t been using Luke UI in a while, but if you are talking about top terms for some
field, that might be doc freq, not term freq and every doc is counted once - that is equivalent
to “Load Term Info” in “Schema” in Solr Admin console.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 21 Nov 2017, at 16:21, Barbet Alain <alian123soleil@gmail.com> wrote:
> 
> Thank you very much for your answer.
> 
> It was an error on copy / paste on my mail sorry about that !
> So it was already a text field, so omitTermFrequenciesAndPosition was
> already on “false”
> 
> So I forget my custom analyzer and try to test with an already defined
> field_type (text_fr) and see same behaviour in luke !
> So I look better.
> On Luke when I took term one by one on "Document" tab, I see my
> frequency set to 2.
> But in first panel of Luke "overview", in "show top terms" Freq is
> still at 1 for all values.
> 
> I use Solr 6.5 & Luke 7.1. It didn't see this behavior if I open a
> Lucene base I build outside Solr, I see top terms freq same on 2
> panels.
> Do you know a reason for that ?
> Does this have an impact on Solr search ? Does bad freq in "top terms"
> come from Luke or Solr ?
> 
> 
> 2017-11-21 12:08 GMT+01:00 Emir Arnautović <emir.arnautovic@sematext.com>:
>> Hi Alain,
>> You did not provided definition of used field type - you use “nametext” type
and pasted “text_ami” field type. It is possible that you have omitTermFrequenciesAndPosition=“true”
on nametext field type. The default value for text fields should be false.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 21 Nov 2017, at 11:43, Barbet Alain <alian123soleil@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I build a custom analyzer & setup it in solr, but doesn't work as I expect.
>>> I always get 1 as frequency for each word even if it's present
>>> multiple time in the text.
>>> 
>>> So I try with default analyzer & find same behavior:
>>> My schema
>>> 
>>> <fieldType name="text_ami" class="solr.TextField">
>>>   <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>>> </fieldType>
>>> <field name="docid" type="string" indexed="true" required="true"
>>> stored="true"/>
>>> <field name="test_text" type="nametext"/>
>>> 
>>> alian@yoda:~/solr> cat add_test.sh
>>> DATA='
>>> <add>
>>> <doc>
>>>   <field name="docid">666</field>
>>>   <field name="test_text">toto titi tata toto tutu titi</field>
>>> </doc>
>>> </add>
>>> '
>>> curl -X POST -H 'Content-Type: text/xml'
>>> 'http://localhost:8983/solr/alian_test/update?commit=true'
>>> --data-binary "$DATA"
>>> 
>>> When I test in solr interface / analyze, I find the right behavior
>>> (find titi & toto 2 times).
>>> But when I look in solr index with Luke or solr interface / schema,
>>> the top term always get 1 as frequency. Can someone give me the thing
>>> I forget ?
>>> 
>>> (solr 6.5)
>>> 
>>> Thank you !
>> 


Mime
View raw message