tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antoni Mylka" <antoni.my...@gmail.com>
Subject Mime type identification of plain text files.
Date Sat, 02 Aug 2008 11:55:24 GMT
Many binary formats begin with magic byte sequences composed of ASCII
characters, e.g.
zipfiles begin with PK
pdfs begin with %PDF-
chms help files begin with ITSF

Does tika make any attempt to distinguish normal txt ASCII documents
that happen do begin with 'PK' from zip files?

Antoni Myłka
View raw message