tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandeepan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2261) TikaOcr giving different result across platforms
Date Thu, 09 Feb 2017 04:21:41 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858992#comment-15858992
] 

Sandeepan commented on TIKA-2261:
---------------------------------

[~tallison@mitre.org]

where do i find rotation.py. Can you please point me to the pypi location. Not able to figure
out which one.

> TikaOcr giving different result across platforms
> ------------------------------------------------
>
>                 Key: TIKA-2261
>                 URL: https://issues.apache.org/jira/browse/TIKA-2261
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.14
>            Reporter: Sandeepan
>         Attachments: 4.png
>
>
> Hi,
> I am using Tika to parse every type of file and it works great for non image files. 
> My local machine is an Mac but I deploy stuff on ubuntu 14.04. On command line, i get
the same result on both the platforms.
> Example Command
> tesseract 3.jpg ouput -l eng -psm 1 txt
> But when I use it through Java code, it gives me very different results and the quality
is worse in case of ubuntu.
> Sample Code
>         AutoDetectParser parser = new AutoDetectParser();
>         BodyContentHandler handler = new BodyContentHandler(-1);
>         Metadata metadata = new Metadata();
>         FileInputStream in = new FileInputStream(path);
>         parser.parse(in, handler, metadata);
>         parsedText = handler.toString();
> On Mac :
> ++++++
> $ tesseract -v
> tesseract 3.04.01
>  leptonica-1.74.1
>   libjpeg 8d : libpng 1.6.28 : libtiff 4.0.7 : zlib 1.2.8
> On Ubuntu
> ubuntu@ubuntu-4gb-postprocess:~$ tesseract -v
> tesseract 3.04.01
>  leptonica-1.74.1
>   libjpeg 8d : libpng 1.6.28 : libtiff 4.0.7 : zlib 1.2.8
> Not able to figure out what the issue is. \



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message