tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2351) Getting error while parsing documents
Date Tue, 02 May 2017 13:00:06 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992845#comment-15992845
] 

Nick Burch commented on TIKA-2351:
----------------------------------

I've just tried with a recent nightly build, and no error was reported. So, it looks like
this is a bug that has already been fixed in Apache POI and will be included in Tika 1.15

> Getting error while parsing documents
> -------------------------------------
>
>                 Key: TIKA-2351
>                 URL: https://issues.apache.org/jira/browse/TIKA-2351
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.14
>         Environment: Red Hat Enterprise Linux Server release 7.3
> ElasticSearch 5.2.1
> ingest-attachment 5.2.1
>            Reporter: VENU
>              Labels: starter
>         Attachments: 01 - Templete.txt, 02 - Pipeline.txt, 03 - Json_creat_code.txt,
04 - stackTrace.txt, englishAnalyzer.doc
>
>
> Hi Everyone,
> I am using Ingest-attachment for indexing documents. I am able to parse text documents
(.txt files). When I try to parse .doc or pdf files getting this error.
> FILE = /elastic/files/englishAnalyzer.doc
> ID = 6
> "error" : {
> "root_cause" : [
> {
> "type" : "exception",
> "reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing
document in field [data]]; nested: TikaExc
> eption[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079];
nested: ArrayIndexOutOfBoundsException[-1];
> ",
> "header" : {
> "processor_type" : "attachment"
> }
> }
> ],
> "type" : "exception",
> "reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing
document in field [data]]; nested: TikaExcepti
> on[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079];
nested: ArrayIndexOutOfBoundsException[-1];",
> "caused_by" : {
> "type" : "illegal_argument_exception",
> "reason" : "ElasticsearchParseException[Error parsing document in field [data]]; nested:
TikaException[Unexpected RuntimeException fro
> m org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];",
> "caused_by" : {
> "type" : "parse_exception",
> "reason" : "Error parsing document in field [data]",
> "caused_by" : {
> "type" : "tika_exception",
> "reason" : "Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079",
> "caused_by" : {
> "type" : "array_index_out_of_bounds_exception",
> "reason" : "-1"
> }
> }
> }
> },
> "header" : {
> "processor_type" : "attachment"
> }
> },
> "status" : 500
> }
> Please help me to resolve the issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message