lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Gubarkov <>
Subject Re: Two questions on RussianAnalyzer
Date Thu, 19 Apr 2012 20:51:30 GMT
Thank you Robert for detailed reply

On Fri, Apr 20, 2012 at 12:37 AM, Robert Muir <> wrote:
> On Thu, Apr 19, 2012 at 7:26 AM, Vladimir Gubarkov <> wrote:
>> New analyzer:
>> [, 8888, a, b, c, d'e, f, g, h, i, j, k, l_m, n, o, p, q,
>> r, s, t, u, v, z, y, z]
>> Old analyzer:
>> [aaa, bbb, com, 8888, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p,
>> q, r, s, t, u, v, z, y, z]
>> Please note the differences.
> Right, the tokenizer has changed. This is mentioned in the javadocs:
>> The most uncomfortable in new behaviour to me is that in past I used
>> to search by subdomain like
>> and have displayed results with, and
>> so on. Now I have 0 results.
> Don't simply set your version parameter to 3.6 without reindexing.
> This is really important!!!!!!!!!!!
> Otherwise it defeats the whole purpose.

Hmmm... I know this and I reindexed!
I'll try to explain the problem (fortunately, already solved by using
LUCENE_30) ones again:
When indexing with new analyzer the whole lexeme ""
goes to index, not 4 lexems "some", "cool", "site", "com".
So it's now imposible to find this document with query: "".
I'm having an RSS subscription for that search, and now it's broken.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message