lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nettadalet <nsteinb...@dalet.com>
Subject Why do I get different results for the same query with two Solr versions?
Date Thu, 24 Dec 2020 14:35:22 GMT
Hello,

I have the the same field type defined in Solr 4.6 and Solr 7.5. When I
search with both versions, I get different results, and I don't know why

I have the following *field type definition in Solr 4.6*:
<fieldType name="text_type1" class="solr.TextField"
positionIncrementGap="1000">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.ASCIIFoldingFilterFactory" />
		<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.ASCIIFoldingFilterFactory" />
		<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
		<filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                />
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
</fieldType>


I have the following *field type definition in Solr 7.5*:
<fieldType name="text_type1" class="solr.TextField"
positionIncrementGap="1000">
	<analyzer type="index">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.ASCIIFoldingFilterFactory" />
		<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
		<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
		<filter class="solr.FlattenGraphFilterFactory"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
	<analyzer type="query">
		<tokenizer class="solr.WhitespaceTokenizerFactory"/>
		<filter class="solr.ASCIIFoldingFilterFactory" />
		<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
		<filter class="solr.StopFilterFactory"
                                   ignoreCase="true"
                                   words="stopwords.txt"
                                       />
		<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="0"/>
		<filter class="solr.LowerCaseFilterFactory"/>
	</analyzer>
</fieldType>

* I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of
solr.WordDelimiterGraphFilterFactory so the field types will be more alike,
but the result was the same.

I have the following *6 values set for field text1 of type text_type1 for 6
different documents* (the type(s) from above):
KI_d5e7b43a
KI_b7c490bd
KI_7df2f026
KI_fa7d129d
KI_5867aec7
KI_7c3c0b93


My query is *text1=KI_7*.
Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93
Using Solr 7.5, I get all 6 results.

Questions:
1. How come I get different results with the same data, when my fields
definitions are the same (as far as I can tell)?

2. What are the expected results?
I think that the results Solr 7.5 returns are the correct ones, since at the
end of the of the analysis I get *KA* as a term and *7* as a term, both
during the indexing analysis and the query analysis, so, to my
understanding, all 6 results should be found.
Is this correct? if not, what am I missing? what don't I understand
correctly?

I would very much appreciate a full/partial answer, but even a link that
could explain at least the expected results part would be great. 

Thanks in advance, I know this might be a tough one to answer [Hope not  :)]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Mime
View raw message