lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Smith (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5940) change index backwards compatibility policy.
Date Fri, 19 Sep 2014 15:07:34 GMT


Tim Smith commented on LUCENE-5940:

bq. Reindexing is part and parcel of search

i think the general goal should be that this is not the case, especially as search is adopted
more and more as replacements for systems that do not have these limitations/requirements
(databases). obviously this is an ambitious goal that can likely never be fully realized.

also, "reindexing" comes in 2 distinct flavors:
* cold reindexing - rm -rf the index dir, re feed
** requires 2x hardware or downtime
* live reindexing - change config, restart system, re feed all docs, change is "live" once
all docs have been reindexed
** obviously a good idea to snapshot any previous index and config so you can restore later
on error
** minimal downtime (just restart)
** minimal search interruption (some queries related to the change may not match old documents
until reindex is complete)
** old content can be replaced slowly over time to receive full functionality

live reindexing does have lots of pitfalls and may not always be viable. for instance, right
now it is not possible to add offsets to an index using this approach. as soon as the a new
segment is merged with an old one, the offsets are blown away. i had filed a ticket for this.
i'm not looking to reopen old wounds here, just pointing out an issue i had with this and
had to work around.

live reindexing is the goal i strive to achieve when reindexing is required (always comes
with a caveat to backup your index first for safety). some smart choices when designing the
internal schema can reduce or eliminate many prospective issues here even without any core
changes to lucene.

bq. it's strongly recommended that it be gathered into an intermediate store

these recommendations are always valid to make (and i will make them), however this adds an
entire new system to the mix. as well as new hardware, services, maintenance, security, etc.
also, given the scale and perhaps complexity of the documents, this may not even be enough
and will still require a large amount of processing hardware to process these documents as
fast as the index can index them in a reasonable amount of time (days vs months). in general,
this is just extra complexity that will be dropped due to the higher price tag and maintenance
cost. then, when it finally is time to upgrade the end-user expectation is that "oh, we already
have the data indexed, why can't we just use that with the new software". this expectation
is set due to the fact that many customers/users are used to working with databases. i do
not have this expectation myself, however i have people downstream that do have these expectations
and i need to do my best to accommodate them whether i like it or not.

note, i'm not trying to force any requirements on lucene devs, or soliciting advice on specific
functionality, just pointing out some real world use cases i encounter related to discussion

> change index backwards compatibility policy.
> --------------------------------------------
>                 Key: LUCENE-5940
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
> Currently, our index backwards compatibility is unmanageable. The length of time in which
we must support old indexes is simply too long.
> The index back compat works like this: everyone wants it, but there are frequently bugs,
and when push comes to shove, its not a very sexy thing to work on/fix, so its hard to get
any help.
> Currently our back compat "promise" is just a broken promise, because we cannot actually
guarantee it for these reasons.
> I propose we scale back the length of time for which we must support old indexes.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message