lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: edismax phrase matching with a non-word char inbetween
Date Wed, 14 Dec 2011 14:00:56 GMT
What I think is happening here is that WordDelimiterFilterFactory is
throwing away your non-alpha-numeric characters. You can see
this in admin/analysis, which I've found *extremely* helpful when
faced with this kind of question.

Best
Erick

On Tue, Dec 13, 2011 at 10:37 AM, Robert Brown <rob@intelcompute.com> wrote:
> I have a field which is indexed and queried as follows:
>
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
> <filter class="solr.SynonymFilterFactory" synonyms="text-synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
>
>
>
> When searching for "street work" (with quotes), i'm getting matches and
> highlighting on things like...
>
>
> "...Oxford <em>Street</em> (<em>Work</em> Experience)..."
>
>
> why is this happening, and what can I do to stop it?
>
> I've set <int name="qs">0</int> in my config to try and avert this sort of
> behaviour, am I correct in thinking that this is used to ensure there are no
> words in-between the phrase words?
>

Mime
View raw message