[ https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691929#comment-13691929 ] Nick Burch commented on TIKA-1138: ---------------------------------- That's often a sign that the parser can't handle them. There's some discussion on the dev list at the moment about how best to report that, but it hasn't concluded As an example, solupro.xls is an Excel-95 file, which Apache POI (the library Tika uses for .xls) doesn't handle, hence why you're able to get metadata but not text > I got empty body and empty title with some documents > ---------------------------------------------------- > > Key: TIKA-1138 > URL: https://issues.apache.org/jira/browse/TIKA-1138 > Project: Tika > Issue Type: Bug > Components: general > Affects Versions: 1.3 > Environment: Windows 7 (my desktop) > Reporter: Koutsoulis Philippe > Labels: test > > *+Tested version:+* Apache Tika 1.3 (with the Apache Tika GUI) > Hi all, > I have empty body and empty title with some documents. > Do you have an idea? > *+Extract from my "Structured Text"+* > {noformat} > > > ... > > </head> > <body/></html> > {noformat} > *+Files to reproduce+* > [http://www.justice.gouv.fr/art_pix/declaration_sexe_20091016.xls] > [http://ge.ch/ssco_gestats/excel/deinfo_par_ht2004.xls] > [http://homepage.swissonline.ch/ccvaf1/stock_divers/palmares_ccvaf.xls] > [http://top1000.anthologeek.net/participants.current.txt] > [http://ge.ch/ssco_gestats/excel/refona_par_ht2006.xls] > [http://www.rad.fr/solupro.xls] > [http://www.pfynschiessen.ch/TClassementgroupeinvite.xls] > [http://www.gregdonner.org/workbench/wb_31rev.txt] > (i) No error in logs :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira