lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Why do the Japanese analyser FST files change every release?
Date Fri, 07 Aug 2015 06:26:31 GMT
It is (b).

D.

On Fri, Aug 7, 2015 at 3:05 AM, Trejkaz <trejkaz@trypticon.org> wrote:
> I have recently done updates from Lucene 3.6 to 4.x and 4.x to 5.2.
>
> During this process, I noticed that the FST used by the Japanese
> analyser (AKA Kuromoji) was changing between releases. As I fear
> breakages in backwards compatibility, I worried that the dictionary
> had changed, so I wrote a little program to read it in and print the
> words out in order.
>
> What I find is that in all three releases, the list of words is
> exactly the same - even though the files have changed subtly from
> release to release.
>
> What's up with that? I can think of a few possibilities:
>
> (a) the dictionary _has_ actually changed, and merely printing the
> list of words was not enough (e.g., the parts of speech changed)
>
> (b) the dictionary hasn't changed, but the files change when the FST
> format changes
>
> (c) the dictionary hasn't changed, but the files change because
> they're built on demand every time Lucene is built and there is
> something non-deterministic about the process (e.g. something is using
> a HashMap internally.)
>
> I'm hoping that it's (b), but does anybody know?
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message