lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mchaput <>
Subject HTMLParser choking on Unicode
Date Tue, 08 Apr 2003 16:54:37 GMT
When I try to index Japanese HTML files using HTMLParser, I just get
"lexical errors" in every file:

   Parse Aborted: Lexical error at line 12, column 28.
   Encountered: "\u2030" (8240), after : ""

Is there something I have to do to make HTMLParser work with Unicode?

(I haven't done anything special with readers or encodings (don't really
know much about it)... is that the problem?)



Matt Chaput           |   A l i a s | W a v e f r o n t
Information Designer  |   210 King St. E. Toronto, ON, Canada M5A 1J7    |   (416) 874-8268
"A goddamned ray of sunshine all the goddamned time" --Sparkle Hayter

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message