nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2141) Change the InteractiveSelenium plugin handler Interface to return page content
Date Sun, 18 Oct 2015 19:38:05 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962574#comment-14962574
] 

ASF GitHub Bot commented on NUTCH-2141:
---------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/nutch/pull/77


> Change the InteractiveSelenium plugin handler Interface to return page content
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-2141
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2141
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>            Reporter: Balaji Gurumurthy
>            Assignee: Chris A. Mattmann
>              Labels: selenium
>
> The handler interface in the protocol-interactiveselenium plugin currently provide methods
to manipulate the page content and the HTTPResponse class read's the page content from the
driver. This limits the amount of HTML content that could be returned to nutch.
> The processDriver method could return a String object instead. This is particularly helpful
 in cases such as handling pagination when multiple pages' content can be appended and returned
from the handler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message