lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Naber <>
Subject Re: CJK Support for HTMLParser.jj
Date Tue, 07 Sep 2004 20:19:13 GMT
On Monday 23 August 2004 13:46, Joey Lawrance wrote:

> I've attached the HTMLParser.jj file that successfully parses Japanese
> HTML for indexing.


thanks for the patch. When I compile it with "ant javacc-HTMLParser" I get 
this warning:

"Warning: Line 364, Column 3: Non-ASCII characters used in regular 
Please make sure you use the correct Reader when you create the parser that 
can handle your character set."

Is it okay to get this warning? The line the warning refers to is this one:

| < CJK:                                          // non-alphabets

Besides that, the patch seems to work, i.e. the parser doesn't stop on 
Japanese HTML files anymore, but that's all I can say, as I don't speak 



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message