[ https://issues.apache.org/jira/browse/NUTCH-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1884:
----------------------------------------
Fix Version/s: (was: 2.3)
2.4
> NullPointerException in parsechecker and indexchecker with symlinks in file URL
> -------------------------------------------------------------------------------
>
> Key: NUTCH-1884
> URL: https://issues.apache.org/jira/browse/NUTCH-1884
> Project: Nutch
> Issue Type: Bug
> Components: indexer, parser
> Affects Versions: 1.9, 2.2.1
> Environment: Mac OS X 10.9.2
> Apache Maven 2.2.1
> Java version: 1.7.0_51
> Reporter: Mengying Wang
> Priority: Minor
> Fix For: 2.4, 1.10
>
> Attachments: NUTCH-1884-trunk-v1.patch
>
>
> I have downloaded the Nutch source code from github (https://github.com/apache/nutch),
applied the patches (NUTCH-1879 and NUTCH-1880), and then reinstalled the Nutch. Now the
good news is that all urls contain only 1 slash. But unfortunately, the java.lang.NullPointerException
warning/error still exists for both of the parsechecker and indexchecker commands.
> Below is the running log:
> (1) $ ./nutch parsechecker "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> signature: 17bdb44990391c96bb8d48d1802ff11c
> Couldn't pass score, url file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
(java.lang.NullPointerException)
> ---------
> Url
> ---------------
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
> ---------
> ParseData
> ---------
> Version: 5
> Status: success(1,0)
> Title: Index of /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
> Outlinks: 2
> outlink: toUrl: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/
anchor: ../
> outlink: toUrl: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml
anchor: monitor.xml
> Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue, 14 Oct
2014 20:05:50 GMT Content-Type=text/html
> Parse Metadata: CharEncodingForConversion=windows-1252 OriginalCharEncoding=windows-1252
> (2) $ ./nutch indexchecker "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|