tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoni Mylka <antoni.my...@gmail.com>
Subject Charset detection
Date Wed, 09 Dec 2009 15:25:43 GMT
Aperturians, Tika

I was wondering if anyone has any experience with the jchardet library
for charset detection. Does it work? What kinds of documents does it
actually support.

Christiaan has posted an idea to the Aperture tracker how we could use
jchardet to improve the plain text extractor, but it doesn't seem to
work.  Or maybe the Tika guys have figured it out already and I can just
use Tika for this? :)

Antoni Mylka

View raw message