nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-2273) Selenium and InteractiveSelenium Do Not Support HTTPS
Date Thu, 02 Jun 2016 16:09:59 GMT
Brian Zhao created NUTCH-2273:
---------------------------------

             Summary: Selenium and InteractiveSelenium Do Not Support HTTPS
                 Key: NUTCH-2273
                 URL: https://issues.apache.org/jira/browse/NUTCH-2273
             Project: Nutch
          Issue Type: Bug
          Components: plugin
    Affects Versions: 1.11
            Reporter: Brian Zhao


Both Selenium and InteractiveSelenium plugins do not have the https protocol specified in
their plugin.xml, and will not fetch https links.

To fix for the Selenium plugin you should add: 
  
      <implementation id="org.apache.nutch.protocol.selenium.Http"
                      class="org.apache.nutch.protocol.selenium.Http">
         <parameter name="protocolName" value="https"/>
      </implementation>

to Selenium's plugin.xml (as a child element of the "extension" element)

An implementation already exists in protocol-http HttpResponse.java, and I've merged it into
selenium's HttpResponse.java here: http://pastebin.com/ZAPfwee4

This should probably be similarly done for the InteractiveSelenium plugin.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message