lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: Incremental Field Updates
Date Tue, 08 Jul 2014 16:13:26 GMT
That's a cool patch. Thanks


On Thursday, July 3, 2014, Gopal Patwa <gopalpatwa@gmail.com> wrote:

> Thanks Ravi, it is good to know general problem with updatable field. In
> our use-case where we have few fields which update more frequently then
> main index. We are using this SOLR join contrib patch with DocTransformer
> for returning data from join core. But this approach has some performance
> impact if that performance hit acceptable for your use use-case then you
> can give a try if you are using SOLR.
>
> https://issues.apache.org/jira/browse/SOLR-4787
>
>
>
>
>
> On Thu, Jul 3, 2014 at 3:22 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com <javascript:;>> wrote:
>
> > In case of sorting, updatable DocValues may be what you are looking for.
> >
> > But updatable fields for searching is a different beast.
> >
> > A sample approach is documented at
> >
> >
> http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/
> >
> > The general problems with updatable postings-list AFAIK are
> >
> > 1. Impossible to correctly score updated documents
> > 2. Segment Merges could miss out updates
> > 3. Might behave in-correctly with NRT
> > 4. Freq updates could end-up creating lots of files because of
> append-only
> >     nature of lucene...
> >
> > May be if you are not too worried about scoring, correct NRT behavior etc
> > you can attempt a solution like the RedisCodec stuff...
> >
> > Segregating static & dynamic fields into 2 separate indexes as described
> > here
> >
> >
> http://www.lucenerevolution.org/2013/Sidecar-Index-Solr-Components-for-Parallel-Index-Management
> > may be of some use to you
> >
> > --
> > Ravi
> >
> >
> >
> > On Wed, Jul 2, 2014 at 7:29 PM, Shai Erera <serera@gmail.com
> <javascript:;>> wrote:
> >
> > > Using BinaryDocValues is not recommended for all scenarios. It is a
> > > "catchall" alternative to the other DocValues types. I would not use it
> > > unless it makes sense for your application, even if it means that you
> > need
> > > to re-index a document in order to update a single field.
> > >
> > > DocValues are not good for "search" - by search I assume you mean take
> a
> > > query such as "apache AND lucene" and find all documents which contain
> > both
> > > terms under the same field. They are good for sorting and faceting
> > though.
> > >
> > > So I guess the answer to your question is "it depends" (it always is!)
> -
> > I
> > > would use DocValues for sorting and faceting, but not for regular
> search
> > > queries. And I would use BinaryDocValues only when the other DocValues
> > > types don't match.
> > >
> > > Also, note that the current field-level update of DocValues is not
> always
> > > better than re-indexing the document, you can read here for more
> details:
> > >
> >
> http://shaierera.blogspot.com/2014/04/benchmarking-updatable-docvalues.html
> > >
> > > Shai
> > >
> > >
> > > On Tue, Jul 1, 2014 at 9:17 PM, Sandeep Khanzode <
> > > sandeep_khanzode@yahoo.com.invalid> wrote:
> > >
> > > > Hi Shai,
> > > >
> > > > So one follow-up question.
> > > >
> > > > Assume that my use case is to have approx. ~50M documents indexed
> with
> > > > each document having about ~10-15 indexed but not stored fields.
> These
> > > > fields will never change, but there are another ~5-6 fields that will
> > > > change and will continue to change after the index is written. These
> > ~5-6
> > > > fields may also be multivalued. The size of this index turns out to
> be
> > > > ~120GB.
> > > >
> > > > In this case, I would like to sort or facet or search on these ~5-6
> > > > fields. Which approach do you suggest? Should I use BinaryDocValues
> and
> > > > update using IW or use either a ParallelReader/Join query.
> > > >
> > > > -----------------------
> > > > Thanks n Regards,
> > > > Sandeep Ramesh Khanzode
> > > >
> > > >
> > > > On Tuesday, July 1, 2014 9:53 PM, Shai Erera <serera@gmail.com
> <javascript:;>> wrote:
> > > >
> > > >
> > > >
> > > > Except that Lucene now offers efficient numeric and binary DocValues
> > > > updates. See IndexWriter.updateNumeric/Binary...
> > > >
> > > > On Jul 1, 2014 5:51 PM, "Erick Erickson" <erickerickson@gmail.com
> <javascript:;>>
> > > wrote:
> > > >
> > > > > This JIRA is "complicated", don't really expect it in 4.9 as it's
> > > > > been hanging around for quite a while. Everyone would like this,
> > > > > but it's not easy.
> > > > >
> > > > > Atomic updates will work, but you have to stored="true" for all
> > > > > source fields. Under the covers this actually reads the document
> > > > > out of the stored fields, deletes the old one and adds it
> > > > > over again.
> > > > >
> > > > > FWIW,
> > > > > Erick
> > > > >
> > > > > On Tue, Jul 1, 2014 at 5:32 AM, Sandeep Khanzode
> > > > > <sandeep_khanzode@yahoo.com.invalid> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I wanted to know of the best approach to follow if a few fields
> in
> > my
> > > > > indexed documents are changing at run time (after index and before
> or
> > > > > during search), but a majority of them are created at index time.
> > > > > >
> > > > > > I could see the JIRA given below but it is scheduled for Lucene
> > 4.9,
> > > I
> > > > > believe.
> > > > > >
> > > > > > There are a few other approaches, like maintaining a separate
> index
> > > for
> > > > > changing fields and use either a parallelreader or use a Join.
> > > > > >
> > > > > > Can everyone share their experience for this scenario on how
it
> is
> > > > > handled in your systems? Thanks,
> > > > > >
> > > > > > [LUCENE-4258] Incremental Field Updates through Stacked Segments
> -
> > > ASF
> > > > > JIRA
> > > > > >
> > > > > >
> > > > > >  [LUCENE-4258] Incremental Field Updates through Stacked
> Segments -
> > > ASF
> > > > > JIRA
> > > > > > Shai and I would like to start working on the proposal to
> > Incremental
> > > > > Field Updates outlined here (
> > > > http://markmail.org/message/zhrdxxpfk6qvdaex
> > > > > ).
> > > > > > View on issues.apache.org Preview by Yahoo
> > > > > >
> > > > > >
> > > > > > -----------------------
> > > > > > Thanks n Regards,
> > > > > > Sandeep Ramesh Khanzode
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> <javascript:;>
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> <javascript:;>
> > > > >
> > > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message