lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Martinelli <simon.martine...@gmail.com>
Subject solr.DictionaryCompoundWordTokenFilterFactory extracts words in string
Date Tue, 31 Mar 2015 15:03:11 GMT
Hi,

I configured solr.DictionaryCompoundWordTokenFilterFactory using a
dictionary with the following content:

- lindor
- schlitten
- dorsch
- filet

I want to index the compound words

- dorschfilet
- lindorschlitten

dorschfilet is processed as expected

dorsch filet

but lindorschlitten is compound of

lindor and schlitten

but i get

lindor dorsch schlitten

so the filter is extracting dorsch but the word before (lin) and after
(litten) are not valid word parts.

Is there any better compound word filter for German?

Thanks, Simon

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message