nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-1253) Incompatible neko and xerces versions
Date Thu, 07 Feb 2013 04:21:39 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lewis John McGibbney updated NUTCH-1253:
----------------------------------------

    Attachment: TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt
                NUTCH-1253-2.x-v2.patch

Patch for 2.x (same as for 1.X) hopefully.
Failing tests for TestDOMContentUtils which indicate something is not working quite well.
I've had enough today and heading home, head is bursting. 
                
> Incompatible neko and xerces versions
> -------------------------------------
>
>                 Key: NUTCH-1253
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1253
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.4
>         Environment: Ubuntu 10.04
>            Reporter: Dennis Spathis
>            Assignee: Lewis John McGibbney
>             Fix For: 1.7, 2.2
>
>         Attachments: NUTCH-1253-2.x-v2.patch, NUTCH-1253-nutchgora.patch, NUTCH-1253.patch,
TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt
>
>
> The Nutch 1.4 distribution includes
>  - nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
> nekohtml)
>  - xercesImpl-2.9.1.jar (under .../runtime/local/lib)
> These two JARs appear to be incompatible versions. When the HtmlParser (configured to
use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError.
(Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a
> catch(Throwable) clause in the getParse method to log the stacktrace.)
> I found that substituting a later, compatible version of nekohtml (1.9.11)
> fixes the problem.
> Curiously, and in support of the above, the nekohtml plugin.xml file in
> Nutch 1.4 contains the following:
> <plugin
>    id="lib-nekohtml"
>    name="CyberNeko HTML Parser"
>    version="1.9.11"
>    provider-name="org.cyberneko">
>    <runtime>
>        <library name="nekohtml-0.9.5.jar">
>            <export name="*"/>
>        </library>
>    </runtime>
> </plugin>
> Note the conflicting version numbers (version tag is "1.9.11" but the
> specified library is "nekohtml-0.9.5.jar").
> Was the 0.9.5 version included by mistake? Was the intention rather to
> include 1.9.11?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message