nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Balaji Gurumurthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2141) Change the InteractiveSelenium plugin handler Interface to return page content
Date Thu, 15 Oct 2015 21:13:05 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959627#comment-14959627
] 

Balaji Gurumurthy commented on NUTCH-2141:
------------------------------------------

When we concatenate the content from multiple pages and then try to load it back to the browser
using JavascriptExecutor, more often than not we get exceptions ("Unterminated string literal",
"Missing ; before statement" to name a few ) while executing the javascript string. Debugging
these errors from all the pages' concatenated content is pain.
Instead of concatenating the content and loading it back to driver and reading it from the
driver back again in HTTPResponse class, just returning the concatenated result back to Nutch
seemed better.

> Change the InteractiveSelenium plugin handler Interface to return page content
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-2141
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2141
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>            Reporter: Balaji Gurumurthy
>              Labels: selenium
>
> The handler interface in the protocol-interactiveselenium plugin currently provide methods
to manipulate the page content and the HTTPResponse class read's the page content from the
driver. This limits the amount of HTML content that could be returned to nutch.
> The processDriver method could return a String object instead. This is particularly helpful
 in cases such as handling pagination when multiple pages' content can be appended and returned
from the handler. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message