maven-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamás Cservenák (JIRA) <j...@codehaus.org>
Subject [jira] Issue Comment Edited: (MINDEXER-14) FlatSearchResponse.totalHits = 1000 when there are in fact more
Date Thu, 31 Mar 2011 15:51:38 GMT

    [ http://jira.codehaus.org/browse/MINDEXER-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=262090#action_262090
] 

Tamás Cservenák edited comment on MINDEXER-14 at 3/31/11 10:51 AM:
-------------------------------------------------------------------

This issue is fixed. MINDEXER-22 is a followup from part of this issue and earlier messy implementation.

All search types now have the {{#setCount( maxDocs )}} method, that allows the user to _limit_
(cap) the number of Lucene Documents to process while executing search.

To check "is there more" (if you capped the search): {{response.getTotalHitCount() > maxDocs}}

This above would be somewhat equal to the "hit limit" with 4.0.0, with big difference, that
here, results are still returned (but capped, just the "top maxDocs" one).

Example: If integrating into server and exposing search over rest (a la Nexus use case), you
probably want to cap it aggressively to avoid OOMExes and other possible exploits/attacks.

Also, the "workaround loop" described by Jesse is implemented and happens _undercover_ when
a non-capped search happens. Initial search happens with a "window" of 1000 hits, and if it
does not fit (totalHits > 1000), it repeats the same search, but this time uses the exact
totalHit number from previous search. Since this could easily lead to OOMEx, in DEBUG logging
level a "note" (and pointer to this issue) is logged just before the point where OOMEx is
expected.

      was (Author: cstamas):
    This issue is fixed. MINDEXER-22 is a followup from part of this issue and earlier messy
implementation.

All search types now have the {{#setCount( maxDocs )}} method, that allows the user to _limit_
(cap) the number of Lucene Documents to process while executing search.

To check "is there more" (if you capped the search): {{response.getTotalHitCount() > maxDocs}}

This above would be somewhat equal to the "hit limit" with 4.0.0, with big difference, that
here, results are still returned (but capped, just the "top maxDocs" one).

Example: If integrating into server and exposing search over rest (a la Nexus use case), you
probably want to cap it aggressively to avoid OOMExes.

Also, the "workaround loop" described by Jesse is implemented and happens _undercover_ when
a non-capped search happens. Initial search happens with a "window" of 1000 hits, and if it
does not fit (totalHits > 1000), it repeats the same search, but this time uses the exact
totalHit number from previous search. Since this could easily lead to OOMEx, in DEBUG logging
level a "note" (and pointer to this issue) is logged just before the point where OOMEx is
expected.
  
> FlatSearchResponse.totalHits = 1000 when there are in fact more
> ---------------------------------------------------------------
>
>                 Key: MINDEXER-14
>                 URL: http://jira.codehaus.org/browse/MINDEXER-14
>             Project: Maven Indexer
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>         Environment: Ubuntu, JDK 6; cause of: https://netbeans.org/bugzilla/show_bug.cgi?id=197036
>            Reporter: Jesse Glick
>            Assignee: Tamás Cservenák
>             Fix For: 4.1.0
>
>
> I am running {{SearchEngine.searchFlatPaged}}. When there happen to be more than 1000
hits in the result, it silently returns just 1000 instead. Surprising behavior since I did
not specify any hit limit. But this is {{AbstractSearchRequest.UNDEFINED_HIT_LIMIT}}, OK.
> Where it gets weirder is that if you set {{resultHitLimit}} to {{UNDEFINED_HIT_LIMIT}},
you still get 1000 results, contradicting the apparent meaning of "undefined". Further, if
you set it to 999 or 1001, and there are a few thousand results, you get an empty result and
{{totalHits}} of -1 or {{AbstractSearchResponse.LIMIT_EXCEEDED}} (which by the way looks like
a constant but is not final!), which is completely different than the behavior for 1000.
> And passing in {{Integer.MAX_VALUE}} to begin with does not work, since then Lucene gets
an {{OutOfMemoryError}} trying to allocate a ridiculously large array or similar.
> Expected behavior: by default, on an otherwise unconfigured search request, the indexer
would return all the hits, however many that is (allocating only a proportional amount of
memory). If I set {{resultHitLimit}} to some value, then that will be used - I will either
get a complete set of results, or {{LIMIT_EXCEEDED}}.
> Workaround: set {{resultHitLimit}} to 1001, then go into a loop retrying the search;
if -1 returned for {{totalHits}}, double the {{resultHitLimit}} and try again.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message