tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: AutoDetectParser is not parsing UTF-16 content types
Date Wed, 29 Aug 2012 16:24:32 GMT
Hi,

On Wed, Aug 29, 2012 at 6:02 PM, chraj007 <chraj.kool@gmail.com> wrote:
> http://lucene.472066.n3.nabble.com/file/n4004078/test.html test.html

Looks like that file has an incorrect http-equiv declaration:

    <META http-equiv="Content-Type" content="text/html; charset=utf-16">

The encoding of the file is not UTF-16.

Can you file a TIKA issue about this? Tika should be able to
automatically detect the correct encoding and use it if the declared
one is obviously incorrect.

BR,

Jukka Zitting

Mime
View raw message