lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: CJKBigramFilter - position bug with outputUnigrams?
Date Fri, 02 May 2014 10:04:19 GMT
>
> Would it be possible to implement an option with a name similar to
> "lastUnigramAtPreviousPosition" so that I can optionally get the
> behavior I'm after when the input is two or more characters, without
> changing current behavior for anyone else?  This would completely solve
> my current problem.
>

This is really not feasible. It sounds like multi-level n-grams in the
same field are a bad match for what you are doing (phrase queries
etc). This just doesnt work, and wont work, based on the mathematics.

Try another approach like removing this filter completely, maybe the
word segmentation by ICU is good enough.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message