nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Asheesh Laroia (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-906) Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not being valid XML tag names
Date Mon, 13 Sep 2010 19:14:36 GMT
Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not being valid
XML tag names
--------------------------------------------------------------------------------------------------------

                 Key: NUTCH-906
                 URL: https://issues.apache.org/jira/browse/NUTCH-906
             Project: Nutch
          Issue Type: Bug
          Components: web gui
    Affects Versions: 1.1
         Environment: Debian GNU/Linux 64-bit
            Reporter: Asheesh Laroia


The Nutch FAQ explains that OpenSearch includes "all fields that are available at search result
time." However, some Lucene column names can start with numbers. Valid XML tags cannot. If
Nutch is generating OpenSearch results for a document with a Lucene document column whose
name starts with numbers, the underlying Xerces library throws this exception: 

org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML character is specified.


So I have written a patch that tests strings before they are used to generate tags within
OpenSearch.

I hope you merge this, or a better version of the patch!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message