tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kathrine Colyn (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1474) PackageParser leaves 7zip Temp Files behind
Date Sat, 21 Feb 2015 08:38:11 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330082#comment-14330082
] 

Kathrine Colyn commented on TIKA-1474:
--------------------------------------

If I put a 7z input stream into tika parser, tika will make a temp file in 
> PackageParser 
> {code}
> ArchiveInputStream ais;
> try {
> ArchiveStreamFactory factory = context.get(
> ArchiveStreamFactory.class, new ArchiveStreamFactory());
> ais = factory.createArchiveInputStream(stream);
> } catch (StreamingNotSupportedException sne) {
> // Most archive formats work on streams, but a few need files
> if (sne.getFormat().equals(ArchiveStreamFactory.SEVEN_Z)) {
> // Rework as a file, and wrap
> stream.reset();
> TikaInputStream tstream = TikaInputStream.get(stream);

Thanks !
http://www.fixithere.net
> 

> PackageParser leaves 7zip Temp Files behind
> -------------------------------------------
>
>                 Key: TIKA-1474
>                 URL: https://issues.apache.org/jira/browse/TIKA-1474
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Fabian Lange
>
> If I put a 7z input stream into tika parser, tika will make a temp file in PackageParser

> {code}
>         ArchiveInputStream ais;
>         try {
>             ArchiveStreamFactory factory = context.get(
>                     ArchiveStreamFactory.class, new ArchiveStreamFactory());
>             ais = factory.createArchiveInputStream(stream);
>         } catch (StreamingNotSupportedException sne) {
>             // Most archive formats work on streams, but a few need files
>             if (sne.getFormat().equals(ArchiveStreamFactory.SEVEN_Z)) {
>                 // Rework as a file, and wrap
>                 stream.reset();
>                 TikaInputStream tstream = TikaInputStream.get(stream);
>                 
>                 // Pending a fix for COMPRESS-269, this bit is a little nasty
>                 ais = new SevenZWrapper(new SevenZFile(tstream.getFile()));
>             } else {
>                 throw new TikaException("Unknown non-streaming format " + sne.getFormat(),
sne);
>             }
>         } catch (ArchiveException e) {
>             throw new TikaException("Unable to unpack document stream", e);
>         }
> {code}
> tstream.getFile() will then internally make a new temp file:
> {code}
>                 // Spool the entire stream into a temporary file
>                 file = tmp.createTemporaryFile();
>                 OutputStream out = new FileOutputStream(file);
> {code}
> this file is not deleted because SevenZWrapper does not close the SevenZFile.
> This can be fixed by implementing the following close method in SevenZWrapper
> {code}
> public void close() throws IOException {
> try {
> file.close();
> } finally {
> super.close();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message