jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tommaso Teofili (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (OAK-3129) SolrQueryIndex making too many Solr requests per jCR query
Date Tue, 21 Jul 2015 15:24:04 GMT

     [ https://issues.apache.org/jira/browse/OAK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tommaso Teofili resolved OAK-3129.
----------------------------------
    Resolution: Fixed

fixed as the first requests uses the 'rows' setting e.g. fetching the first 10k rows, then
if the matching query contains less than 30k entries it fetches 10k at a time while traversing
the cursor, otherwise it makes the following 2 requests and fetches 'numFound' / 2 docs per
request.

> SolrQueryIndex making too many Solr requests per jCR query
> ----------------------------------------------------------
>
>                 Key: OAK-3129
>                 URL: https://issues.apache.org/jira/browse/OAK-3129
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: solr
>    Affects Versions: 1.2.2, 1.3.2, 1.0.17
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 1.2.4, 1.3.3, 1.0.18
>
>
> {{SolrQueryIndex}} and {{FilterQueryParser}} use the {{OakSolrConfiguration#getRows}}
setting in order to set the number of documents that should be fetched in batches while iterating
the {{Cursor}} resulting from a certain query.
> While this is an optimization that avoids loading all the results in memory in cases
where only e.g. the first 10 results of the {{Cursor}} are visited, it tends to perform really
bad when resultsets' cardinality is 10 times or more bigger than the 'rows' setting, because
for each JCR query, 10 or more Solr queries are performed (with the additional network, Solr
calls, etc. latencies).
> In order to avoid that we could make use of the 'rows' setting in order to perform the
first request to Solr and then adapt the subsequent paged requests (controlled by start and
rows Solr HTTP parameters) to be run against the rest of the resultset in no more than 2 Solr
queries. This can be done by looking at the _numFound_ value from Solr's response header (from
the first query) and set the start/rows parameters accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message