maven-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamás Cservenák (JIRA) <j...@codehaus.org>
Subject [jira] Commented: (MINDEXER-14) FlatSearchResponse.totalHits = 1000 when there are in fact more
Date Wed, 30 Mar 2011 17:11:22 GMT

    [ http://jira.codehaus.org/browse/MINDEXER-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=261960#action_261960
] 

Tamás Cservenák commented on MINDEXER-14:
-----------------------------------------

For this issue, I pushed some changes to my github repo, but not yet to ASF SVN.

If you'd like, please feel free to comment them.

https://github.com/cstamas/maven-indexer/commit/d9fb3f1ef62482a28a1815fe183b0ac23158cc2e

Notes: 

- the work is not completely done yet. GroupedSearch is wide open for OOM for example (non-pageable
search type, no way to set "ceiling" yet)
- as for "guessing" the topDocs num to return, we could do that "workaround loop" you described
in SearchEngine itself: after 1st search, it does know the exact hits number, so if selected
"ceiling" (when users does not limit the search) is small, search could be redone with proper
number instead. Using MAXINT is no-go, see Lucene's org.apache.lucene.util.PriorityQueue<T>
sources for a "why", it will _always_ OOM.
- some methods and constants have "_" prefix. Sorry for that, I just forgot to un-"_" them,
was just using compiler help to spot their usage. On final fix, those methods will return
to "normal".
- these changes will make indexer a bit more OOM prone, so more caution will be needed by
integrators.

> FlatSearchResponse.totalHits = 1000 when there are in fact more
> ---------------------------------------------------------------
>
>                 Key: MINDEXER-14
>                 URL: http://jira.codehaus.org/browse/MINDEXER-14
>             Project: Maven Indexer
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>         Environment: Ubuntu, JDK 6; cause of: https://netbeans.org/bugzilla/show_bug.cgi?id=197036
>            Reporter: Jesse Glick
>            Assignee: Tamás Cservenák
>             Fix For: 4.1.0
>
>
> I am running {{SearchEngine.searchFlatPaged}}. When there happen to be more than 1000
hits in the result, it silently returns just 1000 instead. Surprising behavior since I did
not specify any hit limit. But this is {{AbstractSearchRequest.UNDEFINED_HIT_LIMIT}}, OK.
> Where it gets weirder is that if you set {{resultHitLimit}} to {{UNDEFINED_HIT_LIMIT}},
you still get 1000 results, contradicting the apparent meaning of "undefined". Further, if
you set it to 999 or 1001, and there are a few thousand results, you get an empty result and
{{totalHits}} of -1 or {{AbstractSearchResponse.LIMIT_EXCEEDED}} (which by the way looks like
a constant but is not final!), which is completely different than the behavior for 1000.
> And passing in {{Integer.MAX_VALUE}} to begin with does not work, since then Lucene gets
an {{OutOfMemoryError}} trying to allocate a ridiculously large array or similar.
> Expected behavior: by default, on an otherwise unconfigured search request, the indexer
would return all the hits, however many that is (allocating only a proportional amount of
memory). If I set {{resultHitLimit}} to some value, then that will be used - I will either
get a complete set of results, or {{LIMIT_EXCEEDED}}.
> Workaround: set {{resultHitLimit}} to 1001, then go into a loop retrying the search;
if -1 returned for {{totalHits}}, double the {{resultHitLimit}} and try again.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message