nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2579) Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)
Date Thu, 24 May 2018 15:58:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489272#comment-16489272
] 

ASF GitHub Bot commented on NUTCH-2579:
---------------------------------------

sebastian-nagel opened a new pull request #334: NUTCH-2579 Fetcher to use parsed URL to call
ProtocolFactory.getProtocol(url)
URL: https://github.com/apache/nutch/pull/334
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)
> ------------------------------------------------------------------
>
>                 Key: NUTCH-2579
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2579
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher, protocol
>    Affects Versions: 1.14
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
>
>
> The call of ProtocolFactory.getProtocol(url) is synchronized and causes waits for the
lock in a multi-threaded fetcher. It uses the URL string, although it would be more efficient
to use the parsed URL hold in the FetchItem. The lock could be released faster. In addition,
parsing the URL also causes a lock in the URL stream handler:
> {noformat}
> "FetcherThread" #37 daemon prio=5 os_prio=0 tid=0x00007f21edea2000 nid=0x5c20 waiting
for monitor entry [0x00007f21bacb4000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.util.Hashtable.get(Hashtable.java:363)
>         - waiting to lock <0x00000005e01b5840> (a java.util.Hashtable)
>         at java.net.URL.getURLStreamHandler(URL.java:1135)
>         at java.net.URL.<init>(URL.java:599)
>         at java.net.URL.<init>(URL.java:490)
>         at java.net.URL.<init>(URL.java:439)
>         at org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:74)
>         - locked <0x00000005fc5f4fb8> (a org.apache.nutch.protocol.ProtocolFactory)
>         at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:299)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message