lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mosh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-13125) Optimize Queries when sorting by router.field
Date Tue, 08 Jan 2019 12:47:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737089#comment-16737089
] 

mosh edited comment on SOLR-13125 at 1/8/19 12:46 PM:
------------------------------------------------------

{quote}However, we could optimize knowing which shards to fetch docs from for the top-X if
we know how many docs matched the query in the first phase
{quote}
That sounds like it could save a lot of needless fetching in a large cluster.
 How would you tackle this?
 I was thinking a new SearchComponent might suffice.
 WDYT?

Something I noticed while skimming through Sorj; currently aliases are resolved in the client,
but no indication of router.name is sent.
 Perhaps this should be changed so SolrJ requests are easier to interpret by SolrCloud nodes,
eliminating the need to check Zookeeper for _router.name_.


was (Author: moshebla):
{quote}However, we could optimize knowing which shards to fetch docs from for the top-X if
we know how many docs matched the query in the first phase{quote}
That sounds like it could save a lot of needless fetching in a large cluster.
How would you tackle this?
I was thinking a new SearchComponent could suffice.
WDYT?

Something I noticed while skimming through Sorj; currently aliases are resolved in the client,
but no indication of router.name is sent.
Perhaps this should be changed so SolrJ requests are easier to interpret by SolrCloud nodes,
eliminating the need to check Zookeeper for _router.name_.

> Optimize Queries when sorting by router.field
> ---------------------------------------------
>
>                 Key: SOLR-13125
>                 URL: https://issues.apache.org/jira/browse/SOLR-13125
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Minor
>
> We are currently testing TRA using Solr 7.7, having >300 shards in the alias, with
much growth in the coming months.
> The "hot" data(in our case, more recent) will be stored on stronger nodes(SSD, more RAM,
etc).
> A proposal of optimizing queries sorted by router.field(the field which TRA uses to route
the data to the correct collection) has emerged.
> Perhaps, in queries which are sorted by router.field, Solr could be smart enough to wait
for the more recent collections, and in case the limit was reached cancel other queries(or
just not block and wait for the results)?
> For example:
> When querying a TRA which with a filter on a different field than router.field, but sorting
by router.field desc, limit=100.
> Since this is a TRA, solr will issue queries for all the collections in the alias.
> But to optimize this particular type of query, Solr could wait for the most recent collection
in the TRA, see whether the result set matches or exceeds the limit. If so, the query could
be returned to the user without waiting for the rest of the shards. If not, the issuing node
will block until the second query returns, and so forth, until the limit of the request is
reached.
> This might also be useful for deep paging, querying each collection and only skipping
to the next once there are no more results in the specified collection.
> Thoughts or inputs are always welcome.
> This is just my two cents, and I'm always happy to brainstorm.
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message