nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2576) HTTP protocol plugin based on okhttp
Date Wed, 09 May 2018 12:53:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468806#comment-16468806
] 

ASF GitHub Bot commented on NUTCH-2576:
---------------------------------------

jnioche commented on issue #328: NUTCH-2576 HTTP protocol implementation based on okhttp
URL: https://github.com/apache/nutch/pull/328#issuecomment-387728096
 
 
   @sebastian-nagel one thing I noticed with OkHttp is that its [ConnectionPool](https://github.com/square/okhttp/blob/master/okhttp/src/main/java/okhttp3/ConnectionPool.java)
(default maxIdle 5, with eviction after 5 mins)  struggles when used with many threads and
different hostnames, which would typically be the case with Nutch (and StormCrawler). I have
seen an average of 1.5s and up to 6s contention on the ConnectionPool, my guess is that [the
cleanup method](https://github.com/square/okhttp/blob/master/okhttp/src/main/java/okhttp3/ConnectionPool.java#L199)
and its synchronized block is the main culprit. It iterates on all the connections but removes
only the one which has been idle for the longest.
   
   Apart from that okHTTP is great: pretty robust and less arcane than Apache HTTPClient IMHO.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> HTTP protocol plugin based on okhttp
> ------------------------------------
>
>                 Key: NUTCH-2576
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2576
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin, protocol
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.15
>
>
> [Okhttp|http://square.github.io/okhttp/] is an Apache2-licensed http library which supports
HTTP/2. [~jnioche]'s implementation [storm-crawler#443|https://github.com/DigitalPebble/storm-crawler/issues/443]
proves that it should be straightforward to implement a Nutch protocol plugin using okhttp.
A recent HTTP protocol implementation should also fix (most of) the issues reported in NUTCH-2549.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message