nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2585) NPE in TrieStringMatcher
Date Fri, 01 Jun 2018 17:19:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16498278#comment-16498278
] 

Sebastian Nagel commented on NUTCH-2585:
----------------------------------------

The only issue I can see from the stack is a potential race condidition if two threads concurrently
fill the sorted array "children". One thread may set "childrenList = null" while the other
is still using it:
{code:title=TrieStringMatcher.java}
    TrieNode getChild(char nextChar) {
      if (children == null) {
        children = childrenList.toArray(new TrieNode[childrenList.size()]);
        childrenList = null;
        Arrays.sort(children);
      }
{code}

Given that urlfilter-suffix is frequently used and hasn't changed since long, this sounds
like a plausible reason. The chance for this race condition is very low.

> NPE in TrieStringMatcher
> ------------------------
>
>                 Key: NUTCH-2585
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2585
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.14
>            Reporter: Markus Jelsma
>            Priority: Major
>             Fix For: 1.15
>
>
> Stumbled on this one just now:
> {code}
> 2018-05-25 14:29:31,844 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread:
FetcherThread 42 fetch of http://www.ndcmediagroep.nl/wp-content/uploads/2017/03/Leaflet-Noflik-Wenje.pdf
failed with: java.lang.NullPointerException
> 	at org.apache.nutch.util.TrieStringMatcher$TrieNode.getChild(TrieStringMatcher.java:107)
> 	at org.apache.nutch.util.SuffixStringMatcher.shortestMatch(SuffixStringMatcher.java:74)
> 	at org.apache.nutch.urlfilter.suffix.SuffixURLFilter.filter(SuffixURLFilter.java:164)
> 	at org.apache.nutch.net.URLFilters.filter(URLFilters.java:43)
> 	at org.apache.nutch.fetcher.FetcherThread.handleRedirect(FetcherThread.java:487)
> 	at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:404)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message