tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2169) Fix xhtml in combination OCR+metadata extraction from images
Date Mon, 28 Nov 2016 16:50:58 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702461#comment-15702461
] 

Hudson commented on TIKA-2169:
------------------------------

UNSTABLE: Integrated in Jenkins build tika-2.x #177 (See [https://builds.apache.org/job/tika-2.x/177/])
TIKA-2169 fix xhtml in ocr (tallison: rev a47a6993375f4105b16c84872a48b327e213084b)
* (edit) tika-parser-modules/tika-parser-multimedia-module/src/test/java/org/apache/tika/parser/ocr/TesseractOCRParserTest.java
* (edit) tika-core/src/test/java/org/apache/tika/TikaTest.java
* (edit) tika-parser-modules/tika-parser-multimedia-module/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java


> Fix xhtml in combination OCR+metadata extraction from images
> ------------------------------------------------------------
>
>                 Key: TIKA-2169
>                 URL: https://issues.apache.org/jira/browse/TIKA-2169
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>             Fix For: 2.0, 1.15
>
>
> In trunk, I'm getting an embedded html entity for the image's metadata when Tesseract
is available:
> <html>
> ocr content
>  <html>
>  ...metadata
> </html>
> </html>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message