nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (NUTCH-1884) NullPointerException in parsechecker and indexchecker with symlinks in file URL
Date Thu, 06 Nov 2014 22:01:35 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sebastian Nagel resolved NUTCH-1884.
------------------------------------
    Resolution: Fixed

Committed to trunk/1.x, r1637237. Nutch 2.x is not affected because there is no ParseResult
:) Thanks!

> NullPointerException in parsechecker and indexchecker with symlinks in file URL
> -------------------------------------------------------------------------------
>
>                 Key: NUTCH-1884
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1884
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer, parser
>    Affects Versions: 1.9
>         Environment: Mac OS X 10.9.2
> Apache Maven 2.2.1
> Java version: 1.7.0_51
>            Reporter: Mengying Wang
>            Priority: Minor
>             Fix For: 1.10
>
>         Attachments: NUTCH-1884-trunk-v1.patch
>
>
> I have downloaded the Nutch source code from github (https://github.com/apache/nutch),
applied the patches (NUTCH-1879 and NUTCH-1880), and then reinstalled the Nutch.  Now the
good news is that all urls contain only 1 slash. But unfortunately, the java.lang.NullPointerException
warning/error still exists for both of the parsechecker and indexchecker commands.
> Below is the running log:
> (1) $ ./nutch parsechecker "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> signature: 17bdb44990391c96bb8d48d1802ff11c
> Couldn't pass score, url file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
(java.lang.NullPointerException)
> ---------
> Url
> ---------------
> file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/
> ---------
> ParseData
> ---------
> Version: 5
> Status: success(1,0)
> Title: Index of /Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml
> Outlinks: 2
>   outlink: toUrl: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/
anchor: ../
>   outlink: toUrl: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator-0.6/staging/products/xml/monitor.xml
anchor: monitor.xml
> Content Metadata: Content-Length=352 nutch.crawl.score=0.0 Last-Modified=Tue, 14 Oct
2014 20:05:50 GMT Content-Type=text/html 
> Parse Metadata: CharEncodingForConversion=windows-1252 OriginalCharEncoding=windows-1252

> (2) $ ./nutch indexchecker "file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/"
> fetching: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> parsing: file:/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
> contentType: text/html
> Exception in thread "main" java.lang.NullPointerException
> 	at org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:139)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:177)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message