lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Atri Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads
Date Thu, 31 Jan 2019 18:47:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757614#comment-16757614
] 

Atri Sharma commented on LUCENE-8675:
-------------------------------------

Thanks for the comments.

Having a multi shard approach makes sense, but a search is still bottlenecked by the largest
segment it needs to scan. If there are many segments of that type, that might become a problem.

While I agree that range queries might not be directly benefited from parallel scans, but
other queries (such as TermQueries) might be benefitted from a segment parallel scan. In a
typical ElasticSearch interactive query, we see spikes when a large segment is hit for an
interactive use case. Such cases can be optimized with parallel scans.

We should have a method of deciding whether a scan should be parallelized or not, and then
let the execution operator get a set of nodes to execute. That is probably outside the scope
of this JIRA, but I wanted to open this thread to get the conversation going.

> Divide Segment Search Amongst Multiple Threads
> ----------------------------------------------
>
>                 Key: LUCENE-8675
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8675
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Atri Sharma
>            Priority: Major
>
> Segment search is a single threaded operation today, which can be a bottleneck for large
analytical queries which index a lot of data and have complex queries which touch multiple
segments (imagine a composite query with range query and filters on top). This ticket is for
discussing the idea of splitting a single segment into multiple threads based on mutually
exclusive document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning all matching
documents (collectors not terminating early). The second phase patch will introduce staged
execution and will build on top of this patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message