tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "VENU (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2351) Getting error while parsing documents
Date Tue, 02 May 2017 12:38:04 GMT
VENU created TIKA-2351:
--------------------------

             Summary: Getting error while parsing documents
                 Key: TIKA-2351
                 URL: https://issues.apache.org/jira/browse/TIKA-2351
             Project: Tika
          Issue Type: Bug
          Components: general
    Affects Versions: 1.14
         Environment: Red Hat Enterprise Linux Server release 7.3
ElasticSearch 5.2.1
ingest-attachment 5.2.1
            Reporter: VENU


Hi Everyone,

I am using Ingest-attachment for indexing documents. I am able to parse text documents (.txt
files). When I try to parse .doc or pdf files getting this error.

FILE = /elastic/files/englishAnalyzer.doc
ID = 6

"error" : {
"root_cause" : [
{
"type" : "exception",
"reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing
document in field [data]]; nested: TikaExc
eption[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079];
nested: ArrayIndexOutOfBoundsException[-1];
",
"header" : {
"processor_type" : "attachment"
}
}
],
"type" : "exception",
"reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing
document in field [data]]; nested: TikaExcepti
on[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079];
nested: ArrayIndexOutOfBoundsException[-1];",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaException[Unexpected
RuntimeException fro
m org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];",
"caused_by" : {
"type" : "parse_exception",
"reason" : "Error parsing document in field [data]",
"caused_by" : {
"type" : "tika_exception",
"reason" : "Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079",
"caused_by" : {
"type" : "array_index_out_of_bounds_exception",
"reason" : "-1"
}
}
}
},
"header" : {
"processor_type" : "attachment"
}
},
"status" : 500
}

Please help me to resolve the issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message