tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Hayun <avrah...@gmail.com>
Subject Wrong parsing of XML
Date Fri, 11 Jul 2014 15:01:35 GMT
Hi,


Scenario:
1. I use tika-core in my app
2. I use the following to detect the stream's media type:

byte[] bytes = IOUtils.toByteArray(new URL("http://www.amazon.com/sitemap_
video.xml"));
String contentType = new Tika().detect(bytes);


obviously when looking at the sitemap - it is of type application/XML




BUT

Tika returns content type of: plain/text instead of application/xml   !?



Upon debugging, I get to the following class:
CompositeDetector.detect(InputStream input, Metadata metadata)...


Which returns the wrong content type.





ANyone has any idea how to solve it?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message