lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: IntField to IntPoint
Date Wed, 05 Jun 2019 23:03:08 GMT
How we would do it:

- update the index format to v7 (this in itself is fiddly
  but there are ways)
- open the index in-place migrated:
    - get all the leaf indices and wrap each in a new
      subclass of FilterCodecReader
    - override getPointsReader() on that subclass
      to return a correctly implemented PointsReader,
      which can read the data from the stored fields
    - be careful about the order you return the points
    - you might want to spool the points to a
      database like Derby or H2 since if you have a lot
      of data there is a risk of running out of memory
- copy that whole index to a new index using
  IndexWriter#addIndexes(CodecReader...)

Copying the docs works too if you have the original text stored still, but
we didn’t, so we use this sort of technique for all Lucene migrations.

TX


On Thu, 6 Jun 2019 at 07:07, Riccardo Tasso <riccardo.tasso@gmail.com>
wrote:

> Ok,
>  I know this policy and you perfectly explained why it makes sense.
>
> Anyway my index is really big and contains mostly textual data which are
> expensive to reindex (because of custom analysis).
>
> Considering that the IndexUpgrader will efficiently do the most of the work
> I should investigate how to fill this gap, without reindexing from scratch.
>
>
> The most efficient approach I can figure is:
> * convert from 4 to 7
> * open an index reader and an index writer on the 7 index
> * iterate every document
> * read the numeric field (since it's already stored)
> * add to each document the IntPoint field
> * update the document on the index
>
> I guess the expensive task here is the update, since it will delete and
> readd the document, but in this case I think I will save the analysis
> costs.
>
> Do you think there's a better way of doing this reindex?
>
> Thanks
>
>
> Il mer 5 giu 2019, 17:41 Erick Erickson <erickerickson@gmail.com> ha
> scritto:
>
> > You cannot upgrade more than one major version, you must re-index from
> > scratch. There’s a long discussion of why, but basically it’s summed up
> by
> > this quote from Robert Muir:
> >
> > “I think the key issue here is Lucene is an index not a database. Because
> > it is a lossy index and does not retain all of the user's data, its not
> > possible to safely migrate some things automagically. In the norms case
> > IndexWriter needs to re-analyze the text ("re-index") and compute stats
> to
> > get back the value, so it can be re-encoded. The function is y = f(x) and
> > if x is not available its not possible, so lucene can't do it.”
> >
> > This has always been true, before 8x it would just  fail silently as  you
> > have found. Solr/Lucene starts up but don’t  work quite as expected. As
> of
> > Lucene 8x, Lucene (and therefore Solr) will not even open an index that
> > has  _ever_ been touched by Lucene 6x, no matter what intervening steps
> > have been taken. Or in general,  Lucene/Solr X will  not  open indexes
> > touched by X-2, starting with 8x rather than behave unexpectedly.
> >
> > Best,
> > Erick
> >
> > > On Jun 5, 2019, at 8:27 AM, Riccardo Tasso <riccardo.tasso@gmail.com>
> > wrote:
> > >
> > > Hello everybody,
> > > I have a (very big) lucene 4 index with documents using IntField. On
> that
> > > field, which should be stored and sortable, I should search and execute
> > > range queries.
> > >
> > > I've tried to upgrade it from 4 to 7 with IndexUpgrader but I observed
> > that
> > > IntFields aren't searchable anymore.
> > >
> > > Which is the most efficient way to convert IntFields to IntPoints,
> which
> > > are stored and sortable?
> > >
> > > Thanks,
> > > Riccardo
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message