tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Hallett (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2794) Tika extracts text from pdf on MacBook, but not windows server.,
Date Wed, 05 Dec 2018 13:00:00 GMT
Paul Hallett created TIKA-2794:
----------------------------------

             Summary: Tika extracts text from pdf on MacBook, but not windows server.,
                 Key: TIKA-2794
                 URL: https://issues.apache.org/jira/browse/TIKA-2794
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.19.1
         Environment: try:
    headers = \{'X-Tika-PDFextractInlineImages': 'true',} 
    data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER, headers=headers)

    charstoreturn = data['content'].strip().split()[:limit]
    charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"', "'").replace(",","").replace("'","'")

    return True, charstoreturn
 except Exception as err:
    return False, "error {} on file: {}.\n".format(str(err), pathtofile)

This code works on the enclosed pdf file on a MacBook, but not using windows server?
            Reporter: Paul Hallett
             Fix For: 2.0.0
         Attachments: test2.pdf

try:
 headers = \{'X-Tika-PDFextractInlineImages': 'true',} # 
 data = parser.from_file(pathtofile, serverEndpoint=self.TIKA_SERVER, headers=headers)
 #data = parser.from_file(pathtofile, self.TIKA_SERVER)

charstoreturn = data['content'].strip().split()[:limit]
 charstoreturn = ' '.join(charstoreturn).replace("\n", " ").replace('"', "'").replace(",","").replace("'","'")

return True, charstoreturn
 except Exception as err:
 return False, "error {} on file: {}.\n".format(str(err), pathtofile)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message