lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Persson <mailto.wo...@gmail.com>
Subject Abbreviations with KeywordTokenizerFactory
Date Thu, 19 Apr 2012 08:25:23 GMT
Hi solr users.

I'm trying to create an index of geographic data to search with solr.

And I get a problem with searches with abbreviations.

At the moment I use an index filter with

      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.ICUFoldingFilterFactory" />
      </analyzer>

This is because my searches at the moment are need to be full Keywords to
enable correct hits and ranking.

I have other tokenizers for other types of searches.

The problem I got now is with a streets with names like

East Saint James Street.

This could be abbreviated as

E St James St

Anyone got a suggestion what to try?

My guess was to use synonyms but that seems to work only with
WhitespaceTokenizer. I've thought about PatternReplaceCharFilter but that
will be a lot of rules to cover all abbreviations.

Best regards

Daniel

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message