lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés de la Peña (JIRA) <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7255) Paging with SortingMergePolicy and EarlyTerminatingSortingCollector
Date Tue, 26 Apr 2016 12:55:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258025#comment-15258025
] 

Andrés de la Peña commented on LUCENE-7255:
-------------------------------------------

I was thinking in being able to build a {{EarlyTerminatingSortingCollector}} just with the
number of wanted documents, either changing the meaning of the argument:
{code}
- public EarlyTerminatingSortingCollector(Collector in, Sort sort, int numDocsToCollect, Sort
mergePolicySort) {
-   ...
- }

+ public EarlyTerminatingSortingCollector(Collector in, Sort sort, int numWanted, Sort mergePolicySort)
{
+   ...
+ }
{code}
or maybe adding a new method:
{code}
+ public EarlyTerminatingSortingCollector buildWithWanted(Collector in, Sort sort, int numWanted,
Sort mergePolicySort) {
+   ...
+ }
{code}
This way, if it is possible, the paging state managed by users would be composed only by the
last {{FieldDoc}}, as it is done by other collectors. Otherwise, if I'm right, the paging
state managed by users using sorted indexes should be composed by both the last {{FieldDoc}}
and also the number of already collected documents, and an hypothetical {{IndexSearcher}}
aware of index sorting such as the proposed by [LUCENE-6766|https://issues.apache.org/jira/browse/LUCENE-6766]
should modify its {{searchAfter}} method to require the number of documents to skip. 



> Paging with SortingMergePolicy and EarlyTerminatingSortingCollector
> -------------------------------------------------------------------
>
>                 Key: LUCENE-7255
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7255
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 5.3, 5.4, 5.5, 6.0
>            Reporter: Andrés de la Peña
>              Labels: EarlyTerminatingSortingCollector, pagination, paging, searchafter,
sortingmergepolicy
>
> {{EarlyTerminatingSortingCollector}} seems to don't work when used with a {{TopDocsCollector}}
searching for documents after a certain {{FieldDoc}}. That is, it can't be used for paging.
The following code allows to reproduce the problem:
> {code}
> // Sort to be used both with merge policy and queries
> Sort sort = new Sort(new SortedNumericSortField(FIELD_NAME, SortField.Type.INT));
> // Create directory
> RAMDirectory directory = new RAMDirectory();
> // Setup merge policy
> TieredMergePolicy tieredMergePolicy = new TieredMergePolicy();
> SortingMergePolicy sortingMergePolicy = new SortingMergePolicy(tieredMergePolicy, sort);
> // Setup index writer
> IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new SimpleAnalyzer());
> indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
> indexWriterConfig.setMergePolicy(sortingMergePolicy);
> IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
> // Index values
> for (int i = 1; i <= 1000; i++) {
>     Document document = new Document();
>     document.add(new NumericDocValuesField(FIELD_NAME, i));
>     indexWriter.addDocument(document);
> }
> // Force index merge to ensure early termination
> indexWriter.forceMerge(1, true);
> indexWriter.commit();
> // Create index searcher
> IndexReader reader = DirectoryReader.open(directory);
> IndexSearcher searcher = new IndexSearcher(reader);
> // Paginated read
> int pageSize = 10;
> FieldDoc pageStart = null;
> while (true) {
>     logger.info("Collecting page starting at: {}", pageStart);
>     Query query = new MatchAllDocsQuery();
>     TopDocsCollector tfc = TopFieldCollector.create(sort, pageSize, pageStart, true,
false, false);
>     EarlyTerminatingSortingCollector collector = new EarlyTerminatingSortingCollector(tfc,
sort, pageSize, sort);
>     searcher.search(query, collector);
>     ScoreDoc[] scoreDocs = tfc.topDocs().scoreDocs;
>     for (ScoreDoc scoreDoc : scoreDocs) {
>         pageStart = (FieldDoc) scoreDoc;
>         logger.info("FOUND {}", scoreDoc);
>     }
>     logger.info("Terminated early: {}", collector.terminatedEarly());
>     if (scoreDocs.length < pageSize) break;
> }
> // Close
> reader.close();
> indexWriter.close();
> directory.close();
> {code}
> The query for the second page doesn't return any results. However, it gets the expected
results when if we don't wrap the {{TopFieldCollector}} with the {{EarlyTerminatingSortingCollector}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message