tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Caruana Galizia (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2174) JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser
Date Wed, 09 Nov 2016 11:59:58 GMT
Matthew Caruana Galizia created TIKA-2174:
---------------------------------------------

             Summary: JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser
                 Key: TIKA-2174
                 URL: https://issues.apache.org/jira/browse/TIKA-2174
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.14
            Reporter: Matthew Caruana Galizia


Tesseract produces OCR output fine for JPX images as of this version:

{noformat}
  $ tesseract -v
     tesseract 3.04.01
       leptonica-1.73
         libjpeg 8d : libpng 1.6.26 : libtiff 4.0.6 : zlib 1.2.5}}
{noformat}

However, these types are not declared by getSupportTypes so no output is produced for PDFs
which contained JPX images of scanned documents, for example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message