lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rodrigo De Castro <rodr...@sacaluta.com>
Subject Re: High add/delete rate and index fragmentation
Date Fri, 04 Dec 2009 18:07:36 GMT
On Wed, Dec 2, 2009 at 2:43 PM, Jason Rutherglen <jason.rutherglen@gmail.com
> wrote:

> It sounds like you're asking about near realtime search support,
> I'm not sure.  So here's few ideas.
>
> #1 How often do you need to be able to search on the latest
> updates (as opposed to updates from lets say, 10 minutes ago)?
>

You are right that we would need near realtime support. The problem is not
so much about new records becoming available, but guaranteeing that deleted
records will not be returned. For this reason, our plan would be to update
and search a master index, provided that: (1) search while updating records
is ok, (2) performance is not degraded substantially due to fragmentation,
(3) optimization does not impact search, and (4) we ensure durability - if a
node goes down, an update was replicated to another node who can take over.
It seems that 1 and 2 are not so much of a problem, 3 would need to be
tested. I would like know more about how 4 has been addressed, so we don't
lose updates if a master goes down between updates and index replication.


> #3 is a mixed bag at this point, and there is no official
> solution, yet. Shell scripts, and a load balancer could kind of
> work. Check out SOLR-1277 or SOLR-1395 for progress along these
> lines.
>

Thanks for the links.

Rodrigo


> On Wed, Dec 2, 2009 at 11:53 AM, Rodrigo De Castro <rodrigo@sacaluta.com>
> wrote:
> > We are considering Solr to store events which will be added and deleted
> from
> > the index in a very fast rate. Solr will be used, in this case, to find
> the
> > right event we need to process (since they may have several attributes
> and
> > we may search the best match based on the query attributes). Our
> > understanding is that the common use cases are those wherein the read
> rate
> > is much higher than writes, and deletes are not as frequent, so we are
> not
> > sure Solr handles our use case very well or if it is the right fit. Given
> > that, I have a few questions:
> >
> > 1 - How does Solr/Lucene degrade with the fragmentation? That would
> probably
> > determine the rate at which we would need to optimize the index. I
> presume
> > that it depends on the rate of insertions and deletions, but would you
> have
> > any benchmark on this degradation? Or, in general, how has been your
> > experience with this use case?
> >
> > 2 - Optimizing seems to be a very expensive process. While optimizing the
> > index, how much does search performance degrade? In this case, having a
> huge
> > degradation would not allow us to optimize unless we switch to another
> copy
> > of the index while optimize is running.
> >
> > 3 - In terms of high availability, what has been your experience
> detecting
> > failure of master and having a slave taking over?
> >
> > Thanks,
> > Rodrigo
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message