lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johann Höchtl <h.hoec...@ic-drei.de>
Subject Solr creates whitespace in dismax query
Date Tue, 24 Aug 2010 18:41:31 GMT
I have a fieldtype with the following definition:

     <fieldType name="text_kstem" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="false" />
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="openthesaurus.txt" ignoreCase="true"
expand="true"/>
         <filter class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
protected="protwords.txt"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
protected="protwords.txt"/>
       </analyzer>
     </fieldType>

I have a value "blume2000.de" in a field with the fieldtype above. If I issue a query with
select?q=blume2000&qt=dismax (yes the provided field gets searched by dismax handler)
and
the result is empty. Only if I enter the query select?q=blume+2000&qt=dismax I get the
result I want.

So I used the debugQuery=true to find out what's wrong. The interesting thing is, that the
rawquerystring is still correct, but the
parsedquery is:
+DisjunctionMaxQuery((name:"blume 2000" | teaser:"blume 2000")) DisjunctionMaxQuery((teaser:"blume
2000"~3 | name:"blume 2000"~3))

Now I gotta ask, where does the whitespace come from and why isn't the document matched?

If I analyze the query using the admin backend: Field(name): name Fieldvalue(Index): blume2000.de
 and Fieldvalue(Query): blume2000.de it works...

Anybody already had that problem?



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message