nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Joyce (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2141) Change the InteractiveSelenium plugin handler Interface to return page content
Date Thu, 15 Oct 2015 18:21:05 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959345#comment-14959345
] 

Michael Joyce commented on NUTCH-2141:
--------------------------------------

This was actually brought up in NUTCH-2108. There's also an [example handler | https://github.com/apache/nutch/blob/trunk/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefalultMultiInteractionHandler.java]
that was added to illustrate that as well. The handler wont actually be run multiple times
so if you need to return concatenated content you need to do it in the handler and make sure
it's returned appropriately.

> Change the InteractiveSelenium plugin handler Interface to return page content
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-2141
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2141
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>            Reporter: Balaji Gurumurthy
>              Labels: selenium
>
> The handler interface in the protocol-interactiveselenium plugin currently provide methods
to manipulate the page content and the HTTPResponse class read's the page content from the
driver. This limits the amount of HTML content that could be returned to nutch.
> The processDriver method could return a String object instead. This is particularly helpful
 in cases such as handling pagination when multiple pages' content can be appended and returned
from the handler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message