lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Query with plus sign failing
Date Thu, 29 Sep 2011 16:31:57 GMT
The following query is failing:

((Google +))

This is ultimately reduced to 'google' by my analysis chain, but the 
following is in my log (3.2.0, but 3.4.0 also fails):

SEVERE: org.apache.solr.common.SolrException: 
org.apache.lucene.queryParser.ParseException: Cannot parse '(  (Google 
+))': Encountered " ")" ") "" at line 1, column 12.

If I change it to 'Google+' or 'Goo+gle' it works.

Below is the fieldType definition.  The pattern filter is designed to 
strip leading/trailing punctuation characters, but leave any punctuation 
in the middle of a term alone.  It does affect the plus sign, by 
reducing it to a term of length zero.  The length filter then removes it 
at the end.  In the 'Google+' variant, the pattern filter simply strips 
that character off and the query does not fail.  Am I seeing a bug here, 
or problems with my fieldType?

<fieldType name="genText" class="solr.TextField" sortMissingLast="true" 
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
           pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$"
           replacement="$2"
           allowempty="false"
         />
<filter class="solr.WordDelimiterFilterFactory"
           splitOnCaseChange="1"
           splitOnNumerics="1"
           stemEnglishPossessive="1"
           generateWordParts="1"
           generateNumberParts="1"
           catenateWords="1"
           catenateNumbers="1"
           catenateAll="0"
           preserveOriginal="1"
         />
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="1" max="512"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
           pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$"
           replacement="$2"
           allowempty="false"
         />
<filter class="solr.WordDelimiterFilterFactory"
           splitOnCaseChange="1"
           splitOnNumerics="1"
           stemEnglishPossessive="1"
           generateWordParts="1"
           generateNumberParts="1"
           catenateWords="0"
           catenateNumbers="0"
           catenateAll="0"
           preserveOriginal="1"
         />
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="1" max="512"/>
</analyzer>
</fieldType>


Mime
View raw message