tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marichi Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-2908) TikaException: Failed to close temporary resource - how to fix?
Date Wed, 17 Jul 2019 20:39:00 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marichi Gupta updated TIKA-2908:
--------------------------------
    Description: 
I am using Apache Tika on Windows 10, jre 1.8.0_181, and I've imported Tika using Maven with
the following dependencies:

{{<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId>
<version>3.8.1</version> <scope>test</scope> </dependency> <dependency>
<groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId>
<version>1.21</version> </dependency> </dependencies>}}

I have the code below for performing OCR using Tesseract (which I have independently tested
and know to be working):

public static void OCRTest() {

try { 

BufferedImage im = ImageIO.read(new File(OCR_IMAGE)); 

{{TesseractOCRConfig config = new TesseractOCRConfig();}}

config.setTessdataPath("C:
Program Files\\Tesseract-OCR\tessdata");

config.setTesseractPath("C:
Program Files
Tesseract-OCR"); 

{{ParseContext parseContext = new ParseContext();}}

parseContext.set(TesseractOCRConfig.class, config);

TesseractOCRParser parser = new TesseractOCRParser();

BodyContentHandler handler = new BodyContentHandler();

Metadata metadata = new Metadata();

try {

{{parser.parse(im, handler, metadata, parseContext);}}

System.out.println(handler.toString());

} catch (SAXException e)\{ e.printStackTrace(); }

catch (TikaException e) \{ e.printStackTrace(); }

} catch (IOException e)\{ e.printStackTrace(); }

}

I run into the following exception:

org.apache.tika.exception.TikaException: Failed to close temporary resources at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:174)
at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:251) at test.test.App.OCRTest(App.java:46)
at test.test.App.main(App.java:30) Caused by: java.nio.file.FileSystemException: C:\Users\m\AppData\Local\Temp\apache-tika-2643805894084124300.tmp:
The process cannot access the file because it is being used by another process. 

The tmp file is present in the Temp folder. I have the source code downloaded and have stepped
through it with the debugger - the error comes from attempting to close the tmp file. There
is another post on this board (https://issues.apache.org/jira/browse/TIKA-1732) where someone
else has run into the same exception, although with the AutoDetectParser and not Tesseract.
Their issue seemed to be a conflict in their imported jars, but I run into this issue even
with only the Apache Tika libraries installed. I have a feeling this is a concurrency issue,
but I can't pinpoint the conflict.

I don't run into this issue when using the Tika's AutoDetectParser, only with the TesseractOCRParser.
This is an important part of an application I'm working on, so I would really appreciate any
insights on how to proceed.

  was:
I am using Apache Tika on Windows 10, jre 1.8.0_181, and I've imported Tika using Maven with
the following dependencies:

{{<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId>
<version>3.8.1</version> <scope>test</scope> </dependency> <dependency>
<groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId>
<version>1.21</version> </dependency> </dependencies>}}

I have the code below for performing OCR using Tesseract (which I have independently tested
and know to be working):

public static void OCRTest() {

try { 

BufferedImage im = ImageIO.read(new File(OCR_IMAGE)); 

{{TesseractOCRConfig config = new TesseractOCRConfig();}}

config.setTessdataPath("C:\\Program Files\\Tesseract-OCR\tessdata");

config.setTesseractPath("C:\\Program Files\\Tesseract-OCR"); 

{{ParseContext parseContext = new ParseContext();}}

parseContext.set(TesseractOCRConfig.class, config);

TesseractOCRParser parser = new TesseractOCRParser();

BodyContentHandler handler = new BodyContentHandler();

Metadata metadata = new Metadata();

try {

{{parser.parse(im, handler, metadata, parseContext);}}

System.out.println(handler.toString());

} catch (SAXException e)\{ e.printStackTrace(); }
 
catch (TikaException e) \{ e.printStackTrace(); }

} catch (IOException e)\{ e.printStackTrace(); }

}

I run into the following exception:

org.apache.tika.exception.TikaException: Failed to close temporary resources at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:174)
at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:251) at test.test.App.OCRTest(App.java:46)
at test.test.App.main(App.java:30) Caused by: java.nio.file.FileSystemException: C:\Users\m\AppData\Local\Temp\apache-tika-2643805894084124300.tmp:
The process cannot access the file because it is being used by another process. 

The tmp file is present in the Temp folder. I have the source code downloaded and have stepped
through it with the debugger - the error comes from attempting to close the tmp file. On the
Apache Tika forums, there is another post here (https://issues.apache.org/jira/browse/TIKA-1732) where
someone else has run into the same exception, although with the AutoDetectParser and not Tesseract.
Their issue seemed to be a conflict in their imported jars, but I run into this issue even
with only the Apache Tika libraries installed. I have a feeling this is a concurrency issue,
but I can't pinpoint the conflict.

I don't run into this issue when using the Tika's AutoDetectParser, only with the TesseractOCRParser.
This is an important part of an application I'm working on, so I would really appreciate any
insights on how to proceed.


> TikaException: Failed to close temporary resource - how to fix?
> ---------------------------------------------------------------
>
>                 Key: TIKA-2908
>                 URL: https://issues.apache.org/jira/browse/TIKA-2908
>             Project: Tika
>          Issue Type: Bug
>          Components: ocr, parser
>    Affects Versions: 1.21
>            Reporter: Marichi Gupta
>            Priority: Blocker
>              Labels: ocr, tesseract, tika
>
> I am using Apache Tika on Windows 10, jre 1.8.0_181, and I've imported Tika using Maven
with the following dependencies:
> {{<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId>
<version>3.8.1</version> <scope>test</scope> </dependency> <dependency>
<groupId>org.apache.tika</groupId> <artifactId>tika-parsers</artifactId>
<version>1.21</version> </dependency> </dependencies>}}
> I have the code below for performing OCR using Tesseract (which I have independently
tested and know to be working):
> public static void OCRTest() {
> try { 
> BufferedImage im = ImageIO.read(new File(OCR_IMAGE)); 
> {{TesseractOCRConfig config = new TesseractOCRConfig();}}
> config.setTessdataPath("C:
> Program Files\\Tesseract-OCR\tessdata");
> config.setTesseractPath("C:
> Program Files
> Tesseract-OCR"); 
> {{ParseContext parseContext = new ParseContext();}}
> parseContext.set(TesseractOCRConfig.class, config);
> TesseractOCRParser parser = new TesseractOCRParser();
> BodyContentHandler handler = new BodyContentHandler();
> Metadata metadata = new Metadata();
> try {
> {{parser.parse(im, handler, metadata, parseContext);}}
> System.out.println(handler.toString());
> } catch (SAXException e)\{ e.printStackTrace(); }
> catch (TikaException e) \{ e.printStackTrace(); }
> } catch (IOException e)\{ e.printStackTrace(); }
> }
> I run into the following exception:
> org.apache.tika.exception.TikaException: Failed to close temporary resources at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:174)
at org.apache.tika.parser.ocr.TesseractOCRParser.parse(TesseractOCRParser.java:251) at test.test.App.OCRTest(App.java:46)
at test.test.App.main(App.java:30) Caused by: java.nio.file.FileSystemException: C:\Users\m\AppData\Local\Temp\apache-tika-2643805894084124300.tmp:
The process cannot access the file because it is being used by another process. 
> The tmp file is present in the Temp folder. I have the source code downloaded and have
stepped through it with the debugger - the error comes from attempting to close the tmp file.
There is another post on this board (https://issues.apache.org/jira/browse/TIKA-1732) where
someone else has run into the same exception, although with the AutoDetectParser and not Tesseract.
Their issue seemed to be a conflict in their imported jars, but I run into this issue even
with only the Apache Tika libraries installed. I have a feeling this is a concurrency issue,
but I can't pinpoint the conflict.
> I don't run into this issue when using the Tika's AutoDetectParser, only with the TesseractOCRParser.
This is an important part of an application I'm working on, so I would really appreciate any
insights on how to proceed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message