lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Batzenmann <axel.tetzl...@freiheit.com>
Subject Re: Howto concatenate tokens at index time (without spaces)
Date Wed, 01 Oct 2008 08:16:01 GMT


Otis Gospodnetic wrote:
> 
> I haven't used the German analyzer (either Snowball or the one we have in
> Lucene's contrib), but have you checked if that does the trick of keeping
> words together?
> 
I'm not sure how this can work out with words that are space separated,
especially since we use a whitespacetokenizer first in the filter chain.

I solved the problem for now by applying the follwing filter:

public class ConcatFilter extends TokenFilter {
    private Token _last;
    private Queue<Token> _concatVersions = new LinkedList<Token>(); 

    public ConcatFilter(TokenStream input) {
        super(input);
    }

    @Override
    public Token next() throws IOException {
        final Token next = input.next();
        if ( next != null ) {
            if ( _last != null ) {
                final String concatStr = _last.termText() + next.termText();
                _concatVersions.add(new Token(concatStr, 0,
concatStr.length()));
            }
            _last = next;
            return next;
        } else if ( ! _concatVersions.isEmpty() ) {
            return _concatVersions.poll();
        }
        return null;
    }
}
-- 
View this message in context: http://www.nabble.com/Howto-concatenate-tokens-at-index-time-%28without-spaces%29-tp19740271p19756337.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message