tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damiano (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1715) Save embedded images into another location
Date Tue, 18 Aug 2015 12:46:46 GMT
Damiano created TIKA-1715:

             Summary: Save embedded images into another location
                 Key: TIKA-1715
                 URL: https://issues.apache.org/jira/browse/TIKA-1715
             Project: Tika
          Issue Type: Test
          Components: metadata
    Affects Versions: 1.10
            Reporter: Damiano

I am having a strange problem deadling with embedded images.
This is my code:

    public void getImages() throws IOException, TikaException, SAXException {

        try (InputStream stream = new FileInputStream(this.fileName)) {

            RecursiveParserWrapper p = new RecursiveParserWrapper(
                new AutoDetectParser(),
                new BasicContentHandlerFactory(BasicContentHandlerFactory.HANDLER_TYPE.IGNORE,
            ParseContext context = new ParseContext();
            PDFParserConfig config = new PDFParserConfig();
            context.set(org.apache.tika.parser.pdf.PDFParserConfig.class, config);
            context.set(org.apache.tika.parser.Parser.class, p);            
            p.parse(stream, new BodyContentHandler(-1), new Metadata(), context);
            List<Metadata> metadatas = p.getMetadata();
            FileInputStream f = new FileInputStream("/tmp/" + metadatas.get(1).get("File Name"));
            //FileInputStream f = new FileInputStream(metadatas.get(1).get("File Name"));

I can get the name of the embedded images with get("File Name") but the path seems invalid.
I need to save all the embedded images (inline images) to another location.
Thank you in advance!

This message was sent by Atlassian JIRA

View raw message