lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alvaro Cabrerizo <topor...@gmail.com>
Subject synonyms and term position
Date Wed, 09 Oct 2013 07:08:20 GMT
Hi:

I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a
problem using SynonymFilterFactory within the process chain
SynonymFilterFactory, StopFilterFactory .

I have configured synonyms.txt to expand the word AIO as: all-in-one. Well,
when using solr 1.4 I get the following result (term position) when
analysing the string "one aio two".

Solr 1.4 after synonym:

term position |1 | 2 |3 |4 |5
term text |one| all |in |one |two

Solr 1.4 after stopfilter ("in" term is deleted and terms "all" and "one"
are consecutive)

term position |1 | 2 |4 |5
term text |one| all |one |two



But when using solr4.4 I get:

Solr 4.4 after synonym:

term position |1 | 2 |3 |4 |3
term text |one| all |in |one |two

Solr 4.4 after stop ("in" is deleted and the term "two" is now close to
"all" :

term position |1 | 2 |4 |3
term text |one| all |one |two



The problem is that the second word "two" is in position 3 in solr4.4 so
when I try to search aio, in solr1.4 I get results, but find nothing using
Solr4. Is there any option to configure solr4 that imitates solr1.4
behavior.


Regards.




Please, find attached the fieldtype configuration.

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt" />
</analyzer>
</fieldType>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message