tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boopathi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-2403) Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue
Date Thu, 29 Jun 2017 13:32:00 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Boopathi updated TIKA-2403:
---------------------------
    Description: We are using Elasticsearch 5.2.2  for Full text search. With the help of
ingest node we are able to parse the content of files which tika supports. We are facing some
issue while parsing the content the  PDF file . It parsed the content of file successfully
and in addition to that some additional terms which is not even the content of that document.
[sample screen shot|https://www.screencast.com/t/AQWK9Rzvrdo8]. Kindly let me know what is
reason for this and how can it be fixed  (was: We are using Elasticsearch 5.2.2  for Full
text search. With the help of ingest node we are able to parse the content of files which
tika supports. We are fixing some issue while parsing the content the  PDF file . It parsed
the content of file successfully and in addition to that some additional terms which is not
even the content of that document. [sample screen shot|https://www.screencast.com/t/AQWK9Rzvrdo8].
Kindly let me know what is reason for this and how can it be fixed)

> Elasticsearch 5.2.2 - Ingest Node - PDF - Parsing Issue
> -------------------------------------------------------
>
>                 Key: TIKA-2403
>                 URL: https://issues.apache.org/jira/browse/TIKA-2403
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Boopathi
>
> We are using Elasticsearch 5.2.2  for Full text search. With the help of ingest node
we are able to parse the content of files which tika supports. We are facing some issue while
parsing the content the  PDF file . It parsed the content of file successfully and in addition
to that some additional terms which is not even the content of that document. [sample screen
shot|https://www.screencast.com/t/AQWK9Rzvrdo8]. Kindly let me know what is reason for this
and how can it be fixed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message