nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "RunNutchInEclipse1.0" by FrankMcCown
Date Thu, 16 Apr 2009 18:38:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by FrankMcCown:
http://wiki.apache.org/nutch/RunNutchInEclipse1%2e0

The comment on the change is:
Added fix for RTFParseFactory issues

------------------------------------------------------------------------------
  Copy the jar files into src/plugin/parse-mp3/lib and src/plugin/parse-rtf/lib/ respectively.
  Then add the jar files to the build path (First refresh the workspace by pressing F5. Then
right-click the project folder > Build Path > Configure Build Path...  Then select the
Libraries tab, click "Add Jars..." and then add each .jar file individually).
  
+ === Two Errors with RTFParseFactory ===
+ 
+ If you are trying to build the official 1.0 release, Eclipse will complain about 2 errors
regarding the RTFParseFactory (this is after adding the RTF jar file from the previous step).
 This problem was fixed (see [http://issues.apache.org/jira/browse/NUTCH-644 NUTCH-644] and
[http://issues.apache.org/jira/browse/NUTCH-705 NUTCH-705]) but was not included in the 1.0
official release because of licensing issues. So you will need to manually alter the code
to remove these 2 build errors.
+ 
+ In RTFParseFactory.java:
+  1. Add the following import statement: {{{import org.apache.nutch.parse.ParseResult;}}}
+ 
+  2. Change 
+ 
+ {{{
+ public Parse getParse(Content content) {
+ }}}
+ to
+ {{{
+ public ParseResult getParse(Content content) {
+ }}}
+  1.#3 In the getParse function, replace
+ {{{
+ return new ParseStatus(ParseStatus.FAILED,
+                                ParseStatus.FAILED_EXCEPTION,
+                                e.toString()).getEmptyParse(conf);
+ }}}
+ with
+ {{{
+ return new ParseStatus(ParseStatus.FAILED,
+                 ParseStatus.FAILED_EXCEPTION,
+               e.toString()).getEmptyParseResult(content.getUrl(), getConf());
+ }}}
+  1.#4 In the getParse function, replace
+ {{{
+ return new ParseImpl(text,
+                          new ParseData(ParseStatus.STATUS_SUCCESS,
+                                        title,
+                                        OutlinkExtractor.getOutlinks(text, this.conf),
+                                        content.getMetadata(),
+                                        metadata));
+ }}}
+ with
+ {{{
+ return ParseResult.createParseResult(content.getUrl(),
+     		             new ParseImpl(text,
+     		                     new ParseData(ParseStatus.STATUS_SUCCESS,
+     		                             title,
+     		                             OutlinkExtractor.getOutlinks(text, this.conf),
+     		                             content.getMetadata(),
+     		                             metadata)));
+ 
+ }}}
+ 
+ In TestRTFParser.java, replace
+ {{{
+ parse = new ParseUtil(conf).parseByExtensionId("parse-rtf", content);
+ }}}
+ with
+ {{{
+ parse = new ParseUtil(conf).parseByExtensionId("parse-rtf", content).get(urlString);
+ }}}
+ 
+ Once you have made these changes and saved the files, Eclipse should build with no errors.
  
  === Build Nutch ===
  If you setup the project correctly, Eclipse will build Nutch for you into "tmp_build". See
below for problems you could run into.

Mime
View raw message