tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Hayun <avrah...@gmail.com>
Subject Wrong parsing of XML
Date Fri, 11 Jul 2014 15:01:35 GMT

1. I use tika-core in my app
2. I use the following to detect the stream's media type:

byte[] bytes = IOUtils.toByteArray(new URL("http://www.amazon.com/sitemap_
String contentType = new Tika().detect(bytes);

obviously when looking at the sitemap - it is of type application/XML


Tika returns content type of: plain/text instead of application/xml   !?

Upon debugging, I get to the following class:
CompositeDetector.detect(InputStream input, Metadata metadata)...

Which returns the wrong content type.

ANyone has any idea how to solve it?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message