nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Nebel (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-89) parse-rss null pointer exception
Date Sat, 10 Sep 2005 17:43:30 GMT
     [ http://issues.apache.org/jira/browse/NUTCH-89?page=all ]

Michael Nebel updated NUTCH-89:
-------------------------------

    Attachment: parse-rss.20050910.patch

> parse-rss null pointer exception
> --------------------------------
>
>          Key: NUTCH-89
>          URL: http://issues.apache.org/jira/browse/NUTCH-89
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.7, 0.8-dev
>     Reporter: Michael Nebel
>  Attachments: parse-rss.20050910.patch
>
> The rss-parser causes an exception. The reason is a syntax error in the page. Hitting
this pages, the parser trys to add an outlink with "null" as anchor.  The anchor  of a outlink
must no be null. 
> java.lang.NullPointerException
>         at org.apache.nutch.io.UTF8.writeString(UTF8.java:236)
>         at org.apache.nutch.parse.Outlink.write(Outlink.java:51)
>         at org.apache.nutch.parse.ParseData.write(ParseData.java:111)
>         at org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137)
>         at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127)
>         at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39)
>         at org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:281)
>         at org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261)
>         at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148)
> Exception in thread "main" java.lang.RuntimeException: SEVERE error logged.  Exiting
fetcher.
>         at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:354)
>         at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:488)
>         at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:140)
> I suggest the following patch:
> Index: src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java
> ===================================================================
> --- src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java     (revision
279397)
> +++ src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java     (working
copy)
> @@ -157,11 +157,13 @@
>                  if (r.getLink() != null) {
>                      try {
>                          // get the outlink
> -                        theOutlinks.add(new Outlink(r.getLink(), r
> -                                .getDescription()));
> +                       if (r.getDescription()!= null ) {
> +                           theOutlinks.add(new Outlink(r.getLink(), r.getDescription()));
> +                       } else {
> +                           theOutlinks.add(new Outlink(r.getLink(), ""));
> +                       }
>                      } catch (MalformedURLException e) {
> -                        LOG
> -                                .info("nutch:parse-rss:RSSParser Exception: MalformedURL:
"
> +                        LOG.info("nutch:parse-rss:RSSParser Exception: MalformedURL:
"
>                                          + r.getLink()
>                                          + ": Attempting to continue processing outlinks");
>                          e.printStackTrace();
> @@ -185,12 +187,13 @@
>  
>                      if (whichLink != null) {
>                          try {
> -                            theOutlinks.add(new Outlink(whichLink, theRSSItem
> -                                    .getDescription()));
> -
> +                           if (theRSSItem.getDescription()!=null) {
> +                               theOutlinks.add(new Outlink(whichLink, theRSSItem.getDescription()));
> +                           } else {
> +                               theOutlinks.add(new Outlink(whichLink, ""));
> +                           }
>                          } catch (MalformedURLException e) {
> -                            LOG
> -                                    .info("nutch:parse-rss:RSSParser Exception: MalformedURL:
"
> +                            LOG.info("nutch:parse-rss:RSSParser Exception: MalformedURL:
"
>                                              + whichLink
>                                              + ": Attempting to continue processing outlinks");
>                              e.printStackTrace();

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message