lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Patwa <gopalpa...@gmail.com>
Subject Re: Incremental Field Updates
Date Thu, 03 Jul 2014 17:47:07 GMT
Thanks Ravi, it is good to know general problem with updatable field. In
our use-case where we have few fields which update more frequently then
main index. We are using this SOLR join contrib patch with DocTransformer
for returning data from join core. But this approach has some performance
impact if that performance hit acceptable for your use use-case then you
can give a try if you are using SOLR.

https://issues.apache.org/jira/browse/SOLR-4787





On Thu, Jul 3, 2014 at 3:22 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> In case of sorting, updatable DocValues may be what you are looking for.
>
> But updatable fields for searching is a different beast.
>
> A sample approach is documented at
>
> http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/
>
> The general problems with updatable postings-list AFAIK are
>
> 1. Impossible to correctly score updated documents
> 2. Segment Merges could miss out updates
> 3. Might behave in-correctly with NRT
> 4. Freq updates could end-up creating lots of files because of append-only
>     nature of lucene...
>
> May be if you are not too worried about scoring, correct NRT behavior etc
> you can attempt a solution like the RedisCodec stuff...
>
> Segregating static & dynamic fields into 2 separate indexes as described
> here
>
> http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management
> may be of some use to you
>
> --
> Ravi
>
>
>
> On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera <serera@gmail.com> wrote:
>
> > Using BinaryDocValues is not recommended for all scenarios. It is a
> > "catchall" alternative to the other DocValues types. I would not use it
> > unless it makes sense for your application, even if it means that you
> need
> > to re-index a document in order to update a single field.
> >
> > DocValues are not good for "search" - by search I assume you mean take a
> > query such as "apache AND lucene" and find all documents which contain
> both
> > terms under the same field. They are good for sorting and faceting
> though.
> >
> > So I guess the answer to your question is "it depends" (it always is!) -
> I
> > would use DocValues for sorting and faceting, but not for regular search
> > queries. And I would use BinaryDocValues only when the other DocValues
> > types don't match.
> >
> > Also, note that the current field-level update of DocValues is not always
> > better than re-indexing the document, you can read here for more details:
> >
> http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html
> >
> > Shai
> >
> >
> > On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode <
> > sandeep_khanzode@yahoo.com.invalid> wrote:
> >
> > > Hi Shai,
> > >
> > > So one follow-up question.
> > >
> > > Assume that my use case is to have approx. ~50M documents indexed with
> > > each document having about ~10-15 indexed but not stored fields. These
> > > fields will never change, but there are another ~5-6 fields that will
> > > change and will continue to change after the index is written. These
> ~5-6
> > > fields may also be multivalued. The size of this index turns out to be
> > > ~120GB.
> > >
> > > In this case, I would like to sort or facet or search on these ~5-6
> > > fields. Which approach do you suggest? Should I use BinaryDocValues and
> > > update using IW or use either a ParallelReader/Join query.
> > >
> > > -----------------------
> > > Thanks n Regards,
> > > Sandeep Ramesh Khanzode
> > >
> > >
> > > On Tuesday, July 1, 2014 9:53 PM, Shai Erera <serera@gmail.com> wrote:
> > >
> > >
> > >
> > > Except that Lucene now offers efficient numeric and binary DocValues
> > > updates. See IndexWriter.updateNumeric/Binary...
> > >
> > > On Jul 1, 2014 5:51 PM, "Erick Erickson" <erickerickson@gmail.com>
> > wrote:
> > >
> > > > This JIRA is "complicated", don't really expect it in 4.9 as it's
> > > > been hanging around for quite a while. Everyone would like this,
> > > > but it's not easy.
> > > >
> > > > Atomic updates will work, but you have to stored="true" for all
> > > > source fields. Under the covers this actually reads the document
> > > > out of the stored fields, deletes the old one and adds it
> > > > over again.
> > > >
> > > > FWIW,
> > > > Erick
> > > >
> > > > On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
> > > > <sandeep_khanzode@yahoo.com.invalid> wrote:
> > > > > Hi,
> > > > >
> > > > > I wanted to know of the best approach to follow if a few fields in
> my
> > > > indexed documents are changing at run time (after index and before or
> > > > during search), but a majority of them are created at index time.
> > > > >
> > > > > I could see the JIRA given below but it is scheduled for Lucene
> 4.9,
> > I
> > > > believe.
> > > > >
> > > > > There are a few other approaches, like maintaining a separate index
> > for
> > > > changing fields and use either a parallelreader or use a Join.
> > > > >
> > > > > Can everyone share their experience for this scenario on how it is
> > > > handled in your systems? Thanks,
> > > > >
> > > > > [LUCENE-4258] Incremental Field Updates through Stacked Segments
-
> > ASF
> > > > JIRA
> > > > >
> > > > >
> > > > >  [LUCENE-4258] Incremental Field Updates through Stacked Segments
-
> > ASF
> > > > JIRA
> > > > > Shai and I would like to start working on the proposal to
> Incremental
> > > > Field Updates outlined here (
> > > http://markmail.org/message/zhrdxxpfk6qvdaex
> > > > ).
> > > > > View on issues.apache.org Preview by Yahoo
> > > > >
> > > > >
> > > > > -----------------------
> > > > > Thanks n Regards,
> > > > > Sandeep Ramesh Khanzode
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message