lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Allouche <>
Subject Re: Live index upgrading
Date Fri, 21 Jun 2019 14:10:50 GMT
Wow. That is annoying. What is the reason for this?

I assumed there was a smooth upgrade path, but apparently, by design, one has to rebuild the
index at least once every two major releases.

So, my question becomes, what is the recommended way of dealing with reindex-from-scratch
without service interruption? 

So I guess the upgrade path looks something like:
- Create Lucene6 index
- Update Lucene6 index
- Create Lucene7 index
- Separately keep track of which documents are indexed in Lucene7 and Lucene6 indexes
- Make updates to Lucene6 index, concurrently build Lucene7 index from scratch, user Lucene6
index for search.
- When Lucene7 index is fully built, remove Lucene6 index and use Lucene7 index for search.

Rinse and repeat every major version.

Really, isn't there something simpler already to handle Lucene major version upgrades?

> On 17 Jun 2019, at 18:04, Erick Erickson <> wrote:
> Let’s back up a bit. What version of Lucene are you using? Starting with Lucene 8,
any index that’s ever been touched by Lucene 6 will not open. It does not matter if the
index has been completely rewritten. It does not matter if it’s been run through IndexUpgraderTool,
which just does a forceMerge to 1 segment. A marker is preserved when a segment is created,
and the earliest one is preserved across merges. So say you have two segments, one created
with 6 and one with 7. The Lucene 6 marker is preserved when they are merged.
> Now, if any segment has the Lucene 6 marker, the index will not be opened by Lucene.
> If you’re using Lucene 7, then this error implies that one or more of your segments
was created with Lucene 5 or earlier.
> So you probably need to re-index from scratch on whatever version of Lucene you want
to use.
> Best,
> Erick
>> On Jun 17, 2019, at 8:41 AM, David Allouche <> wrote:
>> Hello,
>> I use Lucene with PyLucene on a public-facing web application. We have a moderately
large index (~24M documents, ~11GB index data), with a constant stream of new documents.
>> I recently upgraded to PyLucene 7.
>> When trying to test the new release of PyLucene 8, I encountered an IndexFormatTooOld
error because my index conversion from Lucene6 to Lucene7 was not complete.
>> I found IndexUpgrader, and I had a look at its implementation. I would very much
like to avoid putting down the service during the index upgrade, so I believe I cannot use
IndexUpgrader because I need the write lock to be held by the web application to index new
>> So I figure I could get the desired result with an IndexWriter.forceMerge(1). But
the documentation says "This is a horribly costly operation, especially when you pass a small
maxNumSegments; usually you should only call this if the index is static (will no longer be
>> And indeed, forceMerge tends be killed the kernel OOM killer on my development VM.
I want to avoid this failure mode in production. I could increase the VM until it works, but
I would rather have a less brutal approach to upgrading a live index. Something that could
run in the background with reasonable amounts of anonymous memory.
>> What is the recommended approach to upgrading a live index?
>> How can I know from the code that the index needs upgrading at all? I could add a
manual knob to start an upgrade, but it would be better if it occurred transparently when
I upgrade PyLucene.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message