lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene indexes reverting to past state
Date Wed, 26 Aug 2015 13:14:29 GMT
Are you calling IndexWriter.commit when you shut down the app?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Aug 25, 2015 at 11:49 PM, Loamy Hound <loamy.hound@gmail.com> wrote:
> *Summary:*
>
> Lucene indexes appear to revert to some past state after an application
> restart.
>
> *Background:*
>
> We're running an enterprise application written in Java/Spring/Hibernate,
> deployed within Jetty, with a Postgres backend. See below for version info.
>
> We use Lucene to index certain components of the database to enable
> fast/complex searching.
>
> The indexes are built by querying the relevant database tables,
> transferring the data to Lucene documents and writing to disk.
>
> An IndexWriter is used to add and commit the documents. A commit is
> performed at the end of a batch of database reads (generally 5,000). The
> reading and writing of batches is multi-threaded.
>
> The writer is configured with the following TieredMergePolicy attributes:
>
> segmentsPerTier=50.0
> maxMergeAtOnce=5
> maxMergedSegmentMB=100.0
>
>
> No merge scheduler is set. The writer has its RAMBufferSizeMB set to 48.
>
> There are 23 separate indexes used to represent different logical
> components of the database.
>
> The largest index on disk is 13.7G.
>
> The largest index by number of documents contains around 32 million
> documents.
>
> Once the indexes are built they are maintained dynamically by the
> application to reflect the current state of the database. Dynamic updates
> are performed by a TrackingIndexWriter.
>
> *Problem:*
>
> After a reindex is run (as described above, a destructive process) the
> application runs okay and all Lucene queries return expected values that
> reflect the current state of the database.
>
> Subsequent usage of the system maintains the indexes in the correct state
> as evidenced by search results.
>
> In the last month we have found that after a restart of the application the
> indexes appear to revert to some unknown past state. The indexes can be
> queried okay (they're not corrupt, there are no logged errors or stack
> traces) but the data is either out of date (reflecting a past state of the
> database entries they represent) or missing.
>
> We first assumed the "past state" was based on the last reindex time, but
> have subsequently found that restarting the application immediately
> following a reindex still puts the indexes in a state that pre-dates the
> time of the last reindex.
>
> This is only occurring on a single site (our largest production site), and
> has only started in recent months. We have yet to reproduce the problem
> using an identical process with an identical configuration on
> near-identical data.
>
> We are not sure if the problem effects all of the indexes but know the
> larger (and most important) indexes are effected.
>
> *Question:*
>
> We are inclined to think that the problem is somewhere in our code, but are
> wondering if any of the described symptoms have been seen before by the
> Lucene community. Suggestions on how to isolate the problem, or
> configuration changes that may help are also most welcome.
>
> *Version Info:*
>
> Lucene:
>
> lucene-analyzers-common-4.9.1.jar
> lucene-core-4.9.1.jar
> lucene-grouping-4.9.1.jar
> lucene-join-4.9.1.jar
> lucene-misc-4.9.1.jar
> lucene-queries-4.9.1.jar
> lucene-queryparser-4.9.1.jar
> lucene-sandbox-4.9.1.jar
> lucene-snowball-2.4.1.jar
> lucene-suggest-4.9.1.jar
>
> Postgres:
>
> server: PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
> 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
> client access: postgresql-9.1-901.jdbc4.jar
>
> OS:
>
> LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch
> Red Hat Enterprise Linux Server release 6.5 (Santiago)
>
> Java:
>
> java version "1.8.0_45"
> Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
>
> Jetty:
>
> jetty-6.1.22.jar
>
> Hibernate:
>
> hibernate-commons-annotations-4.0.2.Final.jar
> hibernate-core-4.2.2.Final.jar
> hibernate-ehcache-4.2.2.Final.jar
> hibernate-jpa-2.0-api-1.0.1.Final.jar
>
> Spring:
>
> spring-aop-4.0.4.RELEASE.jar
> spring-aspects-4.0.4.RELEASE.jar
> spring-beans-4.0.4.RELEASE.jar
> spring-context-4.0.4.RELEASE.jar
> spring-context-support-4.0.4.RELEASE.jar
> spring-core-4.0.4.RELEASE.jar
> spring-expression-4.0.4.RELEASE.jar
> spring-instrument-4.0.4.RELEASE.jar
> spring-jdbc-4.0.4.RELEASE.jar
> spring-jms-4.0.4.RELEASE.jar
> spring-orm-4.0.4.RELEASE.jar

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message