lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Desidero <desid...@gmail.com>
Subject Re: How to make good use of the multithreaded IndexSearcher?
Date Tue, 01 Oct 2013 19:58:45 GMT
Benson,

Rather than forcing a random number of small segments into the index using
maxMergedSegmentMB, it might be better to split your index into multiple
shards. You can create a specific number of balanced shards to control the
parallelism and then forceMerge each shard down to 1 segment to avoid
spawning extra threads per shard. Once that's done, you just open all of
the shards with a MultiReader and use that with the IndexSearcher and an
ExecutorService.

The downside to this is that it doesn't play nicely with near real-time
search, but if you have a relatively static index that gets pushed to
slaves periodically it gets the job done.

As Mike said, it'd be nicer if there was a way to split the docID space
into virtual shards, but it's not currently available. I'm not sure if
anyone is even looking into it.

Regards,
Matt


On Tue, Oct 1, 2013 at 7:09 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> You might want to set a smallish maxMergedSegmentMB in
> TieredMergePolicy to "force" enough segments in the index ... sort of
> the opposite of optimizing.
>
> Really, IndexSearcher's approach to using one thread per segment is
> rather silly, and, it's annoying/bad to expose change in behavior due
> to segment structure.
>
> I think it'd be better to carve up the overall docID space into N
> virtual shards.  Ie, if you have 100M docs, then one thread searches
> docs 0-10M, another 10M-20M, etc.  Nobody has created such a searcher
> impl but it should not be hard and it would be agnostic to the segment
> structure.
>
> But then again, this need (using concurrent hardware to reduce latency
> of a single query) is somewhat rare; most apps are fine using the
> concurrency across queries rather than within one query.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Oct 1, 2013 at 7:09 AM, Adrien Grand <jpountz@gmail.com> wrote:
> > Hi Benson,
> >
> > On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies <benson@basistech.com>
> wrote:
> >> The multithreaded index searcher fans out across segments. How
> aggressively
> >> does 'optimize' reduce the number of segments? If the segment count goes
> >> way down, is there some other way to exploit multiple cores?
> >
> > forceMerge[1], formerly known as optimize, takes a parameter to
> > configure how many segments should remain in the index.
> >
> > Regarding multi-core usage, if your query load is high enough to use
> > all you CPUs (there are alwas #cores queries running in parrallel),
> > there is generally no need to use the multi-threaded IndexSearcher.
> > The multi-threaded index searcher can however help in case all CPU
> > power is not in use or if you care more about latency than throughput.
> > It indeed leverages the fact that the index is splitted into segments
> > to parallelize query execution, so a fully merged index will actually
> > run the query in a single thread in any case.
> >
> > There is no way to make query execution efficiently use several cores
> > on a single-segment index so if you really want to parallelize query
> > execution, you will have to shard the index to do at the index level
> > what the multi-threaded IndexSearcher does at the segment level.
> >
> > Side notes:
> >  - A single segment index only runs more efficiently queries which are
> > terms-dictionary-intensive, it is generally discouraged to run
> > forceMerge on an index unless this index is read-only.
> >  - The multi-threaded index searcher only parallelizes query execution
> > in certain cases. In particular, it never parallelizes execution when
> > the method takes a collector. This means that if you want to use
> > TotalHitCountCollector to count matches, you will have to do the
> > parallelization by yourself.
> >
> > [1]
> http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/IndexWriter.html#forceMerge%28int%29
> >
> > --
> > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message