tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1138) Empty body and empty title with some XLS and TXT documents
Date Mon, 24 Jun 2013 14:40:20 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692044#comment-13692044
] 

Nick Burch commented on TIKA-1138:
----------------------------------

I've just tried another excel file of that list, it's another Excel95 one. A quick way to
check is `java -classpath /path/to/main/poi/jar org.apache.poi.poifs.dev.POIFSLister problem.xls`
- Excel97+ have a Workbook entry, 95 has a Book entry instead

participants.current.txt is being mis-detected as message/news, which is why you're not getting
any text back. I think the mime magic for that might need tightening, but I don't know anything
about the format so I'm not sure what we should be changing it to
                
> Empty body and empty title with some XLS and TXT documents
> ----------------------------------------------------------
>
>                 Key: TIKA-1138
>                 URL: https://issues.apache.org/jira/browse/TIKA-1138
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Koutsoulis Philippe
>
> *No error in logs*
> *+Extract from my "Structured Text":+*
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
> <head>
> ...
> <title/>
> </head>
> <body/></html>
> {noformat}
> *+Files to reproduce+*
> [http://www.justice.gouv.fr/art_pix/declaration_sexe_20091016.xls]
> [http://ge.ch/ssco_gestats/excel/deinfo_par_ht2004.xls]
> [http://homepage.swissonline.ch/ccvaf1/stock_divers/palmares_ccvaf.xls]
> [http://ge.ch/ssco_gestats/excel/refona_par_ht2006.xls]
> [http://www.pfynschiessen.ch/TClassementgroupeinvite.xls]
> [http://top1000.anthologeek.net/participants.current.txt]
> [http://www.gregdonner.org/workbench/wb_31rev.txt]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message