xml-xindice-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olle Olsson <ol...@sics.se>
Subject Apache Xindice - export/import - encodings lost
Date Mon, 21 Mar 2005 16:50:17 GMT
About:   xindice-1.1b4 - commandline tool
Question: How to preserve encodings across export/import ???

--------------------------------
Scenario explaining the problem
--------------------------------
Step 1.
Adding document:
     xindice ad -c xmldb:xindice://localhost:8080/db/foo/ -n 36.xml  
-f   36.xml
on document 36.xml:
     <?xml version="1.0" encoding="iso-8859-1"?> ... etc ...
This works OK

Step 2.
Retrieving document:
     xindice rd -c xmldb:xindice://localhost:8080/db/foo/ -n 36.xml  -f 
36a.xml
results in document 36a.xml:
     <?xml version="1.0"?> ... etc ...
Payload of extracted doc (36a.xml) is very much identical to payload of 
original document (36.xml)

Step 3.
Adding document:
     xindice ad -c xmldb:xindice://localhost:8080/db/foo/ -n 36a.xml  
-f   36a.xml
on document 36a.xml above.

This results in error:
     ERROR : Invalid byte 2 of 3-byte UTF-8 sequence.

--------------------------------
What is happening here?
--------------------------------

The reason for this is that there is an ISO-8859-1 character in the 
source document of step 1 (which was the reason for the explicit 
encoding in the document)

This is a practical problem.
In the off-the-shelf installation of Xindice 1.0 this was a non-problem 
-- that tool seems to be more tolerant.
In Xindice 1.1 ... how can one make it work?

--------------------------------
Run-time environment
--------------------------------
  
Xindice server run in an off-the-shelf Tomcat/Cocoon framwork.
 - Tomcat 4.1.12
 - Cocoon 2.1.6


-- 
------------------------------------------------------------------
Olle Olsson   olleo@sics.se   Tel: +46 8 633 15 19  Fax: +46 8 751 72 30
	[Svenska W3C-kontoret: olleo@w3.org]
SICS [Swedish Institute of Computer Science]
Box 1263
SE - 164 29 Kista
Sweden
------------------------------------------------------------------



Mime
View raw message