nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-44) too many search results
Date Fri, 15 Feb 2008 21:54:09 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569428#action_12569428
] 

Andrzej Bialecki  commented on NUTCH-44:
----------------------------------------

The name of the property is somewhat misleading, because it applies to Web GUI and the OpenSearch
servlet. Can we come up with a better name (and shorter too ;) )?

Also, this patch doesn't solve the whole issue, though it addresses the specific scenario
described by the reporter. In general, even if hitsPerPage is small, it is still very expensive
to retrieve a page of results far down the list, e.g. results 1000-10010. Currently Nutch
will attempt to retrieve 10 results no matter what is the starting point, which represents
a potential way to launch a DoS attack. Still, we can first fix this issue, and address this
problem in a new issue.

> too many search results
> -----------------------
>
>                 Key: NUTCH-44
>                 URL: https://issues.apache.org/jira/browse/NUTCH-44
>             Project: Nutch
>          Issue Type: Bug
>          Components: web gui
>         Environment: web environment
>            Reporter: Emilijan Mirceski
>            Assignee: Dennis Kubes
>         Attachments: NUTCH-44.patch
>
>
> There should be a limitation (user defined) on the number of results the search engine
can return. 
> For example, if one modifies the seach url as:
> http://<my>/search.jsp?query=<some quiery>&hitsPerPage=20000&hitsPerSite=0
> The search will try to return 20,000 pages which isn't good for the server side performance.

> Is it possible to have a setting in the config xml files to control this?
> Thanks,
> Emilijan

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message