nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "KuroSaka TeruHiko (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-145) build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM
Date Tue, 20 Dec 2005 00:09:31 GMT
     [ http://issues.apache.org/jira/browse/NUTCH-145?page=all ]

KuroSaka TeruHiko updated NUTCH-145:
------------------------------------

    Summary: build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM  (was: ant
build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM)

> build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM
> -------------------------------------------------------------------
>
>          Key: NUTCH-145
>          URL: http://issues.apache.org/jira/browse/NUTCH-145
>      Project: Nutch
>         Type: Bug
>   Components: web gui
>     Versions: 0.8-dev
>  Environment: Windows XP, Cygwin, Eclipse, JDK 1.4.1
>     Reporter: KuroSaka TeruHiko
>     Priority: Minor
>  Attachments: NUTCH-145-fix.zip
>
> When I ran ant build from within Eclipse, it failed on src/web/include/zh/header.xml
and src/web/pages/zh/*.xml because "document does not h ave a root element" (translated from
Japanese message).
> At a closer look at these files, they have an invisible Unicode UTF-8 BOM character,
that is EF BB BF in hex, or \357\273\277 in octal, at the beginning.
> Perhaps JDK 1.4.x UTF-8 converter does not handle the BOM for UTF-8 files. (Note that
BOM was orginially intended to be used to UTF-16 and UTF-32 encodings to self-identify the
endianness.  But Microsoft started using UTF-8-ized BOM as a character encoding signature.)
> Also noticed was, they use MS-DOS style end-of-line sequence, CR followed by LF, unlike
other ??/*.xml files which use UNIX style EOL.
> Fixed files are available.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message