tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damiano (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1715) Save embedded images into another location
Date Tue, 18 Aug 2015 12:46:46 GMT
Damiano created TIKA-1715:
-----------------------------

             Summary: Save embedded images into another location
                 Key: TIKA-1715
                 URL: https://issues.apache.org/jira/browse/TIKA-1715
             Project: Tika
          Issue Type: Test
          Components: metadata
    Affects Versions: 1.10
            Reporter: Damiano


Hello,
I am having a strange problem deadling with embedded images.
This is my code:

{code:xml}
    public void getImages() throws IOException, TikaException, SAXException {
        

        try (InputStream stream = new FileInputStream(this.fileName)) {

            RecursiveParserWrapper p = new RecursiveParserWrapper(
                new AutoDetectParser(),
                new BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE.IGNORE,
-1)
            );            
            
            ParseContext context = new ParseContext();
            PDFParserConfig config = new PDFParserConfig();
            config.setExtractInlineImages(true);
            config.setExtractUniqueInlineImagesOnly(true);
            context.set(org.apache.tika.parser.pdf.PDFParserConfig.class, config);
            context.set(org.apache.tika.parser.Parser.class, p);            
            
            p.parse(stream, new BodyContentHandler(-1), new Metadata(), context);
            
            List<Metadata> metadatas = p.getMetadata();
                        
            FileInputStream f = new FileInputStream("/tmp/" + metadatas.get(1).get("File Name"));
            //FileInputStream f = new FileInputStream(metadatas.get(1).get("File Name"));
            
            System.out.println(f.available());
        }
    }
{code}

I can get the name of the embedded images with get("File Name") but the path seems invalid.
I need to save all the embedded images (inline images) to another location.
Thank you in advance!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message