nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2576) HTTP protocol plugin based on okhttp
Date Thu, 24 May 2018 15:45:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489246#comment-16489246
] 

ASF GitHub Bot commented on NUTCH-2576:
---------------------------------------

sebastian-nagel commented on issue #328: NUTCH-2576 HTTP protocol implementation based on
okhttp
URL: https://github.com/apache/nutch/pull/328#issuecomment-391711875
 
 
   Done:
   - large-scale test (distributed mode on CDH 5.14.2): 195 million pages fetched from 28
million hosts (90 million hosts in CrawlDb) in 4 cycles using 48 Fetcher tasks each with 120
threads. No issues with the connection pool, at least, not remarkable unless the waits for
locks described or linked in [NUTCH-2578](https://issues.apache.org/jira/browse/NUTCH-2578)
are addressed.
   
   New TODOs:
   - setting/using Cookies
   - re-throw exceptions as HttpException so that they can be handled by Fetcher

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> HTTP protocol plugin based on okhttp
> ------------------------------------
>
>                 Key: NUTCH-2576
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2576
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin, protocol
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.15
>
>
> [Okhttp|http://square.github.io/okhttp/] is an Apache2-licensed http library which supports
HTTP/2. [~jnioche]'s implementation [storm-crawler#443|https://github.com/DigitalPebble/storm-crawler/issues/443]
proves that it should be straightforward to implement a Nutch protocol plugin using okhttp.
A recent HTTP protocol implementation should also fix (most of) the issues reported in NUTCH-2549.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message