tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damiano (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1715) Save embedded images into another location
Date Tue, 18 Aug 2015 12:51:46 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701203#comment-14701203
] 

Damiano commented on TIKA-1715:
-------------------------------

then I would add that the code is extremely slow. The file is only 400kB. It keeps around
22 seconds!

> Save embedded images into another location
> ------------------------------------------
>
>                 Key: TIKA-1715
>                 URL: https://issues.apache.org/jira/browse/TIKA-1715
>             Project: Tika
>          Issue Type: Test
>          Components: metadata
>    Affects Versions: 1.10
>            Reporter: Damiano
>              Labels: newbie
>
> Hello,
> I am having a strange problem deadling with embedded images.
> This is my code:
> {code:xml}
>     public void getImages() throws IOException, TikaException, SAXException {
>         
>         try (InputStream stream = new FileInputStream(this.fileName)) {
>             RecursiveParserWrapper p = new RecursiveParserWrapper(
>                 new AutoDetectParser(),
>                 new BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE.IGNORE,
-1)
>             );            
>             
>             ParseContext context = new ParseContext();
>             PDFParserConfig config = new PDFParserConfig();
>             config.setExtractInlineImages(true);
>             config.setExtractUniqueInlineImagesOnly(true);
>             context.set(org.apache.tika.parser.pdf.PDFParserConfig.class, config);
>             context.set(org.apache.tika.parser.Parser.class, p);            
>             
>             p.parse(stream, new BodyContentHandler(-1), new Metadata(), context);
>             
>             List<Metadata> metadatas = p.getMetadata();
>                         
>             FileInputStream f = new FileInputStream("/tmp/" + metadatas.get(1).get("File
Name"));
>             //FileInputStream f = new FileInputStream(metadatas.get(1).get("File Name"));
>             
>             System.out.println(f.available());
>         }
>     }
> {code}
> I can get the name of the embedded images with get("File Name") but the path seems invalid.
> I need to save all the embedded images (inline images) to another location.
> Thank you in advance!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message