nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "AppChecker (JIRA)" <>
Subject [jira] [Created] (NUTCH-2394) Possible bugs in the source code
Date Mon, 12 Jun 2017 22:27:00 GMT
AppChecker created NUTCH-2394:

             Summary: Possible bugs in the source code
                 Key: NUTCH-2394
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.13
            Reporter: AppChecker

I've checked your project with static analyzer [AppChecker|]
and if found several suspicious code fragments:
1) [src/plugin/headings/src/java/org/apache/nutch/parse/headings/|]

heading is not changed, because java.lang.String.trim returns new string.
Probably, it should be:
heading = heading.trim();

see also:
* [src/plugin/urlnormalizer-host/src/java/org/apache/nutch/net/urlnormalizer/host/|]
* [src/java/org/apache/nutch/indexwriter/elastic/|]
* [src/java/org/apache/nutch/net/urlnormalizer/protocol/|]
* [src/java/org/apache/nutch/net/urlnormalizer/slash/|]
* [src/java/org/apache/nutch/indexer/more/|]

2) [src/java/org/apache/nutch/crawl/|]

if (mode.equals(PARTITION_MODE_DOMAIN) && url != null)
else if ..
  InetAddress address = InetAddress.getByName(url.getHost());
if url is null, method url.getHost() will be invoked, so NullPointerException wiil be thrown

3) [src/java/org/apache/nutch/tools/|]

String[] fullPathLevels = fullDir.split(File.separator);
Using File.separator in regular expressions may throws java.util.regex.PatternSyntaxException
exceptions, because it is "\" on Windows-based systems.
Possible 	correction:
String[] fullPathLevels = fullDir.split(Pattern.quote(File.separator));

This message was sent by Atlassian JIRA

View raw message