nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jcore_XiaTian (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-745) MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
Date Fri, 10 Jul 2009 05:40:14 GMT
MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
------------------------------------------------------------------------

                 Key: NUTCH-745
                 URL: https://issues.apache.org/jira/browse/NUTCH-745
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.0.0
         Environment: JDK1.6 + tomcat 6 + Eclipse3.3 + nutch 1.0
            Reporter: jcore_XiaTian


MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run

	public ParseResult getParse(Content content) {
    	return ParseResult.createParseResult(content.getUrl(), new ParseStatus(ParseStatus.FAILED,

                ParseStatus.FAILED_MISSING_CONTENT, 
        "No textual content available").getEmptyParse(conf)); 
		
		// return null;
	}

========nutch-site.xml=======
<property>
  <name>plugin.includes</name>
  <value>protocol-http|urlfilter-regex|parse-(myHtml|html|text|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|language-identifier|analysis-(zh)</value>
  <description><![CDATA[
  
  ]]>  </description>
</property>
==========parse-plugins.xml============
<mimeType name="text/html">
		<plugin id="parse-myHtml" />
		<plugin id="parse-html" />
	</mimeType>
<alias name="parse-myHtml"
			extension-id="org.apache.nutch.parse.html.MyHtmlParser" />

===src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java========
 public ParseResult getParse(Content content) {
.....
// cannot run the code:
  ParseResult filteredParse = this.htmlParseFilters.filter(content, parseResult, 
                                                             metaTags, root);
.......



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message