lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernadette Houghton <bernadette.hough...@deakin.edu.au>
Subject Problems with WordDelimiterFilterFactory
Date Wed, 07 Oct 2009 22:32:16 GMT
We are having some issues with our solr parent application not retrieving records as expected.

For example, if the input query includes a colon (e.g. hot and cold: temperatures), the relevant
record (which contains a colon in the same place) does not get retrieved; if the input query
does not include the colon, all is fine.  Ditto if the user searches for a query containing
hyphens, e.g. "asia - civilization, although with the qualifier that something like "asia-civilization"
(no spaces either side of the hyphen) works fine, whereas "asia - civilization" (spaces either
side of hyphen) doesn't work.

Our schema.xml contains the following -

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"
expand="false"/>
        -->
                                <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                                <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: bern_houghton@hotmail.com
Email: bernadette.houghton@deakin.edu.au<mailto:bernadette.houghton@deakin.edu.au>
Website: http://www.deakin.edu.au
<http://www.deakin.edu.au/>Deakin University CRICOS Provider Code 00113B (Vic)

Important Notice: The contents of this email are intended solely for the named addressee and
are confidential; any unauthorised use, reproduction or storage of the contents is expressly
prohibited. If you have received this email in error, please delete it and any attachments
immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus
free


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message