lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Upgrading Lucene 4 index to 5 doesn't update it - for just some indices
Date Mon, 06 Jul 2015 08:16:28 GMT

> On Mon, Jul 6, 2015 at 4:32 PM, Uwe Schindler <> wrote:
> > Hi,
> >
> > It could be the reason for this is your classpath:
> >
> > If you load all Lucene Versions into the same classloader (but with different
> package names - I assume you use Maven Shade plugin to do this), Lucene 3
> will load perfectly, yes; Lucene 4 will also load perfectly, yes! But when it tries
> to load Lucene 5, it will fail to load all shipped codecs. Codecs are not
> identified by their Java package name, but by the symbolic name (like
> "Lucene50") as written into the index. The SPI interface of Lucene will load all
> codecs from classpath and save them in a lookup map based on the symbolic
> name. If the Lucene 4 JAR file are placed before Lucene 5 JARs, the "slots" for
> codec names are already taken (because the Lucene 5 loader will see the
> Lucene 4 codecs first), so loading Lucene 5 variants of old codecs is a no-op.
> This may cause those problems, because Lucene 5 ships with "modified"
> versions of the old Lucene 4 codecs - but they are not identical.
> >
> > You can only workaround by loading the Lucene JARs into completely
> different classloaders (don't forget to also set context classloader!). In that
> case you would not even need to change package names!
> Actually, because I did prefix the names, the SPI filenames are also prefixed
> too. So I have the file:

The SPI names are not the class names! They are like "Lucene50" or "Lucene43" as written to
index files!

>     META-
> INF/services/org.trypticon.luceneupgrader.lucene4.internal.lucene.codecs.C
> odec
> And then inside that:
> org.trypticon.luceneupgrader.lucene4.internal.lucene.codecs.lucene40.Luce
> ne40Codec
> org.trypticon.luceneupgrader.lucene4.internal.lucene.codecs.lucene40.Luce
> ne3xCodec
>     ...
> And of course, for Lucene 4, this is being loaded by
> org.trypticon.luceneupgrader.lucene4.internal.lucene.util.NamedSPILoader.

Thats all fine and I understood that you did this!

> So maybe they won't clash after all.

The problem are not the class names, the problem are the names as written to Index. And in
a Lucene 4 index there is written e.g., "Lucene47" as Codec and Postingsformat. The lookup
by name is done by this name. IndexReader/Writer does Codec.forName("Lucene47"). For this
to work, every codec has this name in the codec. On SPI discovery, it will of course load
both of your codecs (the "Lucene47" one from Lucene 4.10 and the one from backwards-codecs.jar
in Lucene 5.x). Depending on which one is earlier in classpath, it will load ony one of those.

The current approach may work with Lucene 5 and Lucene 4.10 (because the API did not really
change), but with later 5.x versions that may change the API of codecs internally this will
fail. Sorry. You have to use different classloaders to be safe, see the issue about that!
You will get crashs like MethodNotFound,...

So please, please: Load the Indexupgraders in different classloaders (and don’t change their
package name, it is not needed then).

> If I found Lucene using "org.apache.lucene" in string constants, I could have
> updated that as well. :)
> I did identify some other problems during the day:
> - segment-less indices don't get upgraded. (raised LUCENE-6658)
> - I had a logic flip bug in the version determination code. (explained one of
> those cases where it said it didn't upgrade it, but not all of
> them.)
> - segment info version bumped to 5 which I had treated as an error, but oh
> well, I decided to stay on the side of safety with that.
> I'm hoping that the segment-less index issue is the underlying cause of the
> few remaining failures on my end.

That should do the trick if you just do a commit with user data on the index after upgrade.
Just execute IndexUpgrader manually with an IndexWriter at hand and call commit() with user
metadata (I think, that enfoces the commit although there are no changes).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message